Commit graph

538 commits

Author SHA1 Message Date
Elijah Newren c339932bd8 repository: remove unnecessary include of path.h
This also made it clear that several .c files that depended upon path.h
were missing a #include for it; add the missing includes while at it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:53 -07:00
Elijah Newren bc5c5ec044 cache.h: remove this no-longer-used header
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.

Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first).  This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:53 -07:00
Elijah Newren 90cbae9ce5 statinfo: move stat_{data,validity} functions from cache/read-cache
These functions do not depend upon struct cache_entry or struct
index_state in any way, and it seems more logical to break them out into
this file, especially since statinfo.h already has the struct stat_data
declaration.

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:53 -07:00
Junio C Hamano cbc882ea38 Merge branch 'jc/pack-ref-exclude-include'
"git pack-refs" learns "--include" and "--exclude" to tweak the ref
hierarchy to be packed using pattern matching.

* jc/pack-ref-exclude-include:
  pack-refs: teach pack-refs --include option
  pack-refs: teach --exclude option to exclude refs from being packed
  docs: clarify git-pack-refs --all will pack all refs
2023-06-13 12:29:45 -07:00
John Cai 4fe42f326e pack-refs: teach pack-refs --include option
Allow users to be more selective over which refs to pack by adding an
--include option to git-pack-refs.

The existing options allow some measure of selectivity. By default
git-pack-refs packs all tags. --all can be used to include all refs,
and the previous commit added the ability to exclude certain refs with
--exclude.

While these knobs give the user some selection over which refs to pack,
it could be useful to give more control. For instance, a repository may
have a set of branches that are rarely updated and would benefit from
being packed. --include would allow the user to easily include a set of
branches to be packed while leaving everything else unpacked.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-12 14:54:14 -07:00
John Cai 826ae79fca pack-refs: teach --exclude option to exclude refs from being packed
At GitLab, we have a system that creates ephemeral internal refs that
don't live long before getting deleted. Having an option to exclude
certain refs from a packed-refs file allows these internal references to
be deleted much more efficiently.

Add an --exclude option to the pack-refs builtin, and use the ref
exclusions API to exclude certain refs from being packed into the final
packed-refs file

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-12 14:54:14 -07:00
Junio C Hamano ccd12a3d6c Merge branch 'en/header-split-cache-h-part-2'
More header clean-up.

* en/header-split-cache-h-part-2: (22 commits)
  reftable: ensure git-compat-util.h is the first (indirect) include
  diff.h: reduce unnecessary includes
  object-store.h: reduce unnecessary includes
  commit.h: reduce unnecessary includes
  fsmonitor: reduce includes of cache.h
  cache.h: remove unnecessary headers
  treewide: remove cache.h inclusion due to previous changes
  cache,tree: move basic name compare functions from read-cache to tree
  cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
  hash-ll.h: split out of hash.h to remove dependency on repository.h
  tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
  dir.h: move DTYPE defines from cache.h
  versioncmp.h: move declarations for versioncmp.c functions from cache.h
  ws.h: move declarations for ws.c functions from cache.h
  match-trees.h: move declarations for match-trees.c functions from cache.h
  pkt-line.h: move declarations for pkt-line.c functions from cache.h
  base85.h: move declarations for base85.c functions from cache.h
  copy.h: move declarations for copy.c functions from cache.h
  server-info.h: move declarations for server-info.c functions from cache.h
  packfile.h: move pack_window and pack_entry from cache.h
  ...
2023-05-09 16:45:46 -07:00
Junio C Hamano d699e27bd4 Merge branch 'tb/ban-strtok'
Mark strtok() and strtok_r() to be banned.

* tb/ban-strtok:
  banned.h: mark `strtok()` and `strtok_r()` as banned
  t/helper/test-json-writer.c: avoid using `strtok()`
  t/helper/test-oidmap.c: avoid using `strtok()`
  t/helper/test-hashmap.c: avoid using `strtok()`
  string-list: introduce `string_list_setlen()`
  string-list: multi-delimiter `string_list_split_in_place()`
2023-05-02 10:13:35 -07:00
Taylor Blau 52acddf36c string-list: multi-delimiter string_list_split_in_place()
Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.

Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.

When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.

This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:

    string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1)

would yield a string list of:

    ["foo", "", "", "bar", "", "", "baz"]

Callers that wish to emulate the behavior of strtok(2) more directly
should call `string_list_remove_empty_items()` after splitting.

To avoid regressions for the new multi-character delimter cases, update
t0063 in this patch as well.

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 16:01:28 -07:00
Elijah Newren d4a4f9291d commit.h: reduce unnecessary includes
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:33 -07:00
Elijah Newren d1cbe1e6d8 hash-ll.h: split out of hash.h to remove dependency on repository.h
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo).  However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id.  Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.

This allows hash.h and object.h to be fairly small, minimal headers.  It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:32 -07:00
Elijah Newren d5fff46f40 copy.h: move declarations for copy.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:31 -07:00
Elijah Newren b7b189cd5a treewide: reduce includes of cache.h in other headers
We had a handful of headers including cache.h that didn't need to
anymore.  Drop those includes and replace them with includes of
smaller files, or forward declarations.  However, note that two .c
files now need to directly include cache.h, though they should have
been including it all along given they are directly using structs
defined in it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:11 -07:00
Elijah Newren 65156bb7ec treewide: remove double forward declaration of read_in_full
cache.h's nature of a dumping ground of includes prevented it from
being included in some compat/ files, forcing us into a workaround
of having a double forward declaration of the read_in_full() function
(see commit 14086b0a13 ("compat/pread.c: Add a forward declaration to
fix a warning", 2007-11-17)).  Now that we have moved functions like
read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered
with unrelated and scary #defines, get rid of the extra forward
declaration and just have compat/pread.c include wrapper.h.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:11 -07:00
Elijah Newren 87bed17907 object-file.h: move declarations for object-file.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
Elijah Newren d48be35ca6 write-or-die.h: move declarations for write-or-die.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:54 -07:00
Elijah Newren e38da487cc setup.h: move declarations for setup.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:54 -07:00
Elijah Newren 32a8f51061 environment.h: move declarations for environment.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:53 -07:00
Elijah Newren d5ebb50dcb wrapper.h: move declarations for wrapper.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:53 -07:00
Elijah Newren 4f6728d52d treewide: remove unnecessary cache.h inclusion from several sources
A number of files were apparently including cache.h solely to get
gettext.h.  By making those files explicitly include gettext.h, we can
already drop the include of cache.h in these files.  On top of that,
there were some files using cache.h that didn't need to for any reason.
Remove these unnecessary includes.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:51 -07:00
Elijah Newren f394e093df treewide: be explicit about dependence on gettext.h
Dozens of files made use of gettext functions, without explicitly
including gettext.h.  This made it more difficult to find which files
could remove a dependence on cache.h.  Make C files explicitly include
gettext.h if they are using it.

However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:51 -07:00
Elijah Newren a6dc3d364c treewide: remove unnecessary cache.h inclusion from a few headers
Ever since a64215b6cd ("object.h: stop depending on cache.h; make
cache.h depend on object.h", 2023-02-24), we have a few headers that
could have replaced their include of cache.h with an include of
object.h.  Make that change now.

Some C files had to start including cache.h after this change (or some
smaller header it had brought in), because the C files were depending
on things from cache.h but were only formerly implicitly getting
cache.h through one of these headers being modified in this patch.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:50 -07:00
Elijah Newren b5fa608180 ident.h: move ident-related declarations out of cache.h
These functions were all defined in a separate ident.c already, so
create ident.h and move the declarations into that file.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:29 -08:00
Elijah Newren 41771fa435 cache.h: remove dependence on hex.h; make other files include it explicitly
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:29 -08:00
Elijah Newren 36bf195890 alloc.h: move ALLOC_GROW() functions from cache.h
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places.  It does mean that we also need to add
includes of alloc.h in a number of C files.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:28 -08:00
Elijah Newren ba3d1c73da treewide: remove unnecessary cache.h includes
We had several header files include cache.h unnecessarily.  Remove
those.  These have all been verified via both ensuring that
    gcc -E $HEADER | grep '"cache.h"'
found no hits and that
    cat >temp.c <<EOF &&
    #include "git-compat-util.h"
    #include "$HEADER"
    int main() {}
    EOF
    gcc -c temp.c
successfully compiles without warnings.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:28 -08:00
Elijah Newren 8bff5ca030 treewide: ensure one of the appropriate headers is sourced first
We had several C files ignoring the rule to include one of the
appropriate headers first; fix that.

While at it, the rule in Documentation/CodingGuidelines about which
header to include has also fallen out of sync, so update the wording to
mention other allowed headers.

Unfortunately, C files in reftable/ don't actually follow the previous
or updated rule.  If you follow the #include chain in its C files,
reftable/system.h _tends_ to be first (i.e. record.c first includes
record.h, which first includes basics.h, which first includees
system.h), but not always (e.g. publicbasics.c includes another header
first that does not include system.h).  However, I'm going to punt on
making actual changes to the C files in reftable/ since I do not want to
risk bringing it out-of-sync with any version being used externally.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:28 -08:00
Junio C Hamano 3ed91c5f22 Merge branch 'ps/fsync-refs-fix'
Fix the sequence to fsync $GIT_DIR/packed-refs file that forgot to
flush its output to the disk..

* ps/fsync-refs-fix:
  refs: fix corruption by not correctly syncing packed-refs to disk
2023-01-02 21:37:19 +09:00
Patrick Steinhardt ce54672f9b refs: fix corruption by not correctly syncing packed-refs to disk
At GitLab we have recently received a report where a repository was left
with a corrupted `packed-refs` file after the node hard-crashed even
though `core.fsync=reference` was set. This is something that in theory
should not happen if we correctly did the atomic-rename dance to:

    1. Write the data into a temporary file.

    2. Synchronize the temporary file to disk.

    3. Rename the temporary file into place.

So if we crash in the middle of writing the `packed-refs` file we should
only ever see either the old or the new state of the file.

And while we do the dance when writing the `packed-refs` file, there is
indeed one gotcha: we use a `FILE *` stream to write the temporary file,
but don't flush it before synchronizing it to disk. As a consequence any
data that is still buffered will not get synchronized and a crash of the
machine may cause corruption.

Fix this bug by flushing the file stream before we fsync.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:18:12 +09:00
Han-Wen Nienhuys 71e5473493 refs: unify parse_worktree_ref() and ref_type()
The logic to handle worktree refs (worktrees/NAME/REF and
main-worktree/REF) existed in two places:

* ref_type() in refs.c

* parse_worktree_ref() in worktree.c

Collapse this logic together in one function parse_worktree_ref():
this avoids having to cross-check the result of parse_worktree_ref()
and ref_type().

Introduce enum ref_worktree_type, which is slightly different from
enum ref_type. The latter is a misleading name (one would think that
'ref_type' would have the symref option).

Instead, enum ref_worktree_type only makes explicit how a refname
relates to a worktree. From this point of view, HEAD and
refs/bisect/abc are the same: they specify the current worktree
implicitly.

The files-backend must avoid packing refs/bisect/* and friends into
packed-refs, so expose is_per_worktree_ref() separately.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-19 11:11:11 -07:00
Ævar Arnfjörð Bjarmason 5cf88fd8b0 git-compat-util.h: use "UNUSED", not "UNUSED(var)"
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75d (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.

Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.

This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.

1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:49:48 -07:00
Jeff King 7718827a2d refs: mark unused virtual method parameters
The refs code uses various polymorphic types (e.g., loose vs packed
ref_stores, abstracted iterators). Not every virtual function or
callback needs all of its parameters. Let's mark the unused ones to
quiet -Wunused-parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:55 -07:00
Jeff King 63e14ee2d6 refs: mark unused each_ref_fn parameters
Functions used with for_each_ref(), etc, need to conform to the
each_ref_fn interface. But most of them don't need every parameter;
let's annotate the unused ones to quiet -Wunused-parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:54 -07:00
Junio C Hamano 538dc459a0 Merge branch 'ep/maint-equals-null-cocci'
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.

* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano 2b0a58d164 Merge branch 'ep/maint-equals-null-cocci' for maint-2.35
* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-02 10:06:04 -07:00
Junio C Hamano afe8a9070b tree-wide: apply equals-null.cocci
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02 09:50:37 -07:00
Junio C Hamano c6da34a610 Revert "Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'"
This reverts commit 991b4d47f0, reversing
changes made to bcd020f88e.
2022-04-13 15:51:33 -07:00
Junio C Hamano 3d8046a820 Merge branch 'ab/refs-various-fixes'
Code clean-up.

* ab/refs-various-fixes:
  refs debug: add a wrapper for "read_symbolic_ref"
  packed-backend: remove stub BUG(...) functions
  misc *.c: use designated initializers for struct assignments
  refs: use designated initializers for "struct ref_iterator_vtable"
  refs: use designated initializers for "struct ref_storage_be"
2022-03-29 12:22:02 -07:00
Junio C Hamano 6e1a8952e9 Merge branch 'ps/fsync-refs'
Updates to refs traditionally weren't fsync'ed, but we can
configure using core.fsync variable to do so.

* ps/fsync-refs:
  core.fsync: new option to harden references
2022-03-25 16:38:25 -07:00
Ævar Arnfjörð Bjarmason 5b8754043c refs debug: add a wrapper for "read_symbolic_ref"
In cd475b3b03 (refs: add ability for backends to special-case reading
of symbolic refs, 2022-03-01) when the "read_symbolic_ref" callback
was added we'd fall back on "refs_read_raw_ref" if there wasn't any
backend implementation of "read_symbolic_ref".

As discussed in the preceding commit this would only happen if we were
running the "debug" backend, e.g. in the "setup for ref completion"
test in t9902-completion.sh with:

    GIT_TRACE_REFS=1 git fetch --no-tags other

Let's improve the trace output, but and also eliminate the
now-redundant refs_read_raw_ref() fallback case. As noted in the
preceding commit the "packed" backend will never call
refs_read_symbolic_ref() (nor is it ever going to). For any future
backend such as reftable it's OK to ask that they either implement
this (or a wrapper) themselves.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17 10:40:14 -07:00
Ævar Arnfjörð Bjarmason ca40893a41 packed-backend: remove stub BUG(...) functions
Remove the stub BUG(...) functions previously used by the "struct
ref_storage_be refs_be_packed" backend.

We never call any functions in the packed backend by using it as a
"normal" primary ref store, instead we'll always initialize a "files"
backend ref-store.

It will then via the "packed_ref_store" member of "struct
files_ref_store" call selected functions in the "packed" backend, and
we'll in addition call others via wrappers in refs.c.

So while these would arguably give us *slightly* more meaningful error
messages we'll NULL the missing members in the initializer anyway, so
we'll reliably get a segfault if we're ever changing the backend and
having it call something it doesn't have.

So there's no need for this verbose boilerplate, and as shown in a
subsequent commit it might even lead to some confusion about the
packed backend being a "real" backend. Let's make it clear that it's
not.

As an aside, this also fixes a warning emitted by SunCC in at least
versions 12.5 and 12.6 of Oracle Developer Studio:

    "refs/packed-backend.c", line 1599: warning: Function has no return statement : packed_create_symref
    "refs/packed-backend.c", line 1606: warning: Function has no return statement : packed_rename_ref)
    "refs/packed-backend.c", line 1613: warning: Function has no return statement : packed_copy_ref
    "refs/packed-backend.c", line 1648: warning: Function has no return statement : packed_create_reflog

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17 10:38:05 -07:00
Ævar Arnfjörð Bjarmason e2f8acb6a0 refs: use designated initializers for "struct ref_iterator_vtable"
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17 10:36:11 -07:00
Ævar Arnfjörð Bjarmason 32bff617c6 refs: use designated initializers for "struct ref_storage_be"
Change the definition of the three refs backends we currently carry to
use designated initializers.

The "= NULL" assignments being retained here are redundant, and could
be removed, but let's keep them for clarity. All of these backends
define almost all fields, so we're not saving much in terms of line
count by omitting these, but e.g. for "refs_be_debug" it's immediately
apparent that we're omitting "init" when comparing its assignment to
the others.

This is a follow-up to similar work merged in bd4232fac3 (Merge
branch 'ab/struct-init', 2021-07-16), a4b9fb6a5c (Merge branch
'ab/designated-initializers-more', 2021-10-18) and a30321b9ea (Merge
branch 'ab/designated-initializers' into
ab/designated-initializers-more, 2021-09-27).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17 10:36:04 -07:00
Patrick Steinhardt bc22d845c4 core.fsync: new option to harden references
When writing both loose and packed references to disk we first create a
lockfile, write the updated values into that lockfile, and on commit we
rename the file into place. According to filesystem developers, this
behaviour is broken because applications should always sync data to disk
before doing the final rename to ensure data consistency [1][2][3]. If
applications fail to do this correctly, a hard crash of the machine can
easily result in corrupted on-disk data.

This kind of corruption can in fact be easily observed with Git when the
machine hard-resets shortly after writing references to disk. On
machines with ext4, this will likely lead to the "empty files" problem:
the file has been renamed, but its data has not been synced to disk. The
result is that the reference is corrupt, and in the worst case this can
lead to data loss.

Implement a new option to harden references so that users and admins can
avoid this scenario by syncing locked loose and packed references to
disk before we rename them into place.

[1]: https://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
[2]: https://btrfs.wiki.kernel.org/index.php/FAQ (What are the crash guarantees of overwrite-by-rename)
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/ext4.rst (see auto_da_alloc)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-15 13:30:58 -07:00
Patrick Steinhardt 0a7b38707d refs/files-backend: optimize reading of symbolic refs
When reading references via `files_read_raw_ref()` we always consult
both the loose reference, and if that wasn't found, we also consult the
packed-refs file. While this makes sense to read a generic reference, it
is wasteful in the case where we only care about symbolic references:
the packed-refs backend does not support them, and thus it cannot ever
return one for us.

Special-case reading of symbolic references for the files backend such
that we always skip asking the packed-refs backend.

We use `refs_read_symbolic_ref()` extensively to determine whether we
need to skip updating local symbolic references during a fetch, which is
why the change results in a significant speedup when doing fetches in
repositories with huge numbers of references. The following benchmark
executes a mirror-fetch in a repository with about 2 million references
via `git fetch --prune --no-write-fetch-head +refs/*:refs/*`:

    Benchmark 1: HEAD~
      Time (mean ± σ):     68.372 s ±  2.344 s    [User: 65.629 s, System: 8.786 s]
      Range (min … max):   65.745 s … 70.246 s    3 runs

    Benchmark 2: HEAD
      Time (mean ± σ):     60.259 s ±  0.343 s    [User: 61.019 s, System: 7.245 s]
      Range (min … max):   60.003 s … 60.649 s    3 runs

    Summary
      'HEAD' ran
        1.13 ± 0.04 times faster than 'HEAD~'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 10:13:46 -08:00
Patrick Steinhardt cd475b3b03 refs: add ability for backends to special-case reading of symbolic refs
Reading of symbolic and non-symbolic references is currently treated the
same in reference backends: we always call `refs_read_raw_ref()` and
then decide based on the returned flags what type it is. This has one
downside though: symbolic references may be treated different from
normal references in a backend from normal references. The packed-refs
backend for example doesn't even know about symbolic references, and as
a result it is pointless to even ask it for one.

There are cases where we really only care about whether a reference is
symbolic or not, but don't care about whether it exists at all or may be
a non-symbolic reference. But it is not possible to optimize for this
case right now, and as a consequence we will always first check for a
loose reference to exist, and if it doesn't, we'll query the packed-refs
backend for a known-to-not-be-symbolic reference. This is inefficient
and requires us to search all packed references even though we know to
not care for the result at all.

Introduce a new function `refs_read_symbolic_ref()` which allows us to
fix this case. This function will only ever return symbolic references
and can thus optimize for the scenario layed out above. By default, if
the backend doesn't provide an implementation for it, we just use the
old code path and fall back to `read_raw_ref()`. But in case the backend
provides its own, more efficient implementation, we will use that one
instead.

Note that this function is explicitly designed to not distinguish
between missing references and non-symbolic references. If it did, we'd
be forced to always search the packed-refs backend to see whether the
symbolic reference the user asked for really doesn't exist, or if it
exists as a non-symbolic reference.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 10:13:46 -08:00
Junio C Hamano 991b4d47f0 Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'
Because a deletion of ref would need to remove it from both the
loose ref store and the packed ref store, a delete-ref operation
that logically removes one ref may end up invoking ref-transaction
hook twice, which has been corrected.

* ps/avoid-unnecessary-hook-invocation-with-packed-refs:
  refs: skip hooks when deleting uncovered packed refs
  refs: do not execute reference-transaction hook on packing refs
  refs: demonstrate excessive execution of the reference-transaction hook
  refs: allow skipping the reference-transaction hook
  refs: allow passing flags when beginning transactions
  refs: extract packed_refs_delete_refs() to allow control of transaction
2022-02-18 13:53:27 -08:00
Ævar Arnfjörð Bjarmason ce14de03db refs API: remove "failure_errno" from refs_resolve_ref_unsafe()
Remove the now-unused "failure_errno" parameter from the
refs_resolve_ref_unsafe() signature. In my recent 96f6623ada (Merge
branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its
callers explicitly request the errno via an output parameter.

As that series shows all but one caller ended up passing in a
boilerplate "ignore_errno", since they only cared about whether the
return value was NULL or not, i.e. if the ref could be resolved.

There was one small issue with that series fixed with a follow-up in
31e3912369 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small
bug in that series was fixed.

After those two there was one caller left in sequencer.c that used the
"failure_errno', but as of the preceding commit it uses a boilerplate
"ignore_errno" instead.

This leaves the public refs API without any use of "failure_errno" at
all. We could still do with a bit of cleanup and generalization
between refs.c and refs/files-backend.c before the "reftable"
integration lands, but that's all internal to the reference code
itself.

So let's remove this output parameter. Not only isn't it used now, but
it's unlikely that we'll want it again in the future. We'd like to
slowly move the refs API to a more file-backend independent way of
communicating error codes, having it use a "failure_errno" was only
the first step in that direction. If this or any other function needs
to communicate what specifically is wrong with the requested "refname"
it'll be better to have the function set some output enum of
well-defined error states than piggy-backend on "errno".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:58:41 -08:00
Patrick Steinhardt 2ed1b64ebd refs: skip hooks when deleting uncovered packed refs
When deleting refs from the loose-files refs backend, then we need to be
careful to also delete the same ref from the packed refs backend, if it
exists. If we don't, then deleting the loose ref would "uncover" the
packed ref. We thus always have to queue up deletions of refs for both
the loose and the packed refs backend. This is done in two separate
transactions, where the end result is that the reference-transaction
hook is executed twice for the deleted refs.

This behaviour is quite misleading: it's exposing implementation details
of how the files backend works to the user, in contrast to the logical
updates that we'd really want to expose via the hook. Worse yet, whether
the hook gets executed once or twice depends on how well-packed the
repository is: if the ref only exists as a loose ref, then we execute it
once, otherwise if it is also packed then we execute it twice.

Fix this behaviour and don't execute the reference-transaction hook at
all when refs in the packed-refs backend if it's driven by the files
backend. This works as expected even in case the refs to be deleted only
exist in the packed-refs backend because the loose-backend always queues
refs in its own transaction even if they don't exist such that they can
be locked for concurrent creation. And it also does the right thing in
case neither of the backends has the ref because that would cause the
transaction to fail completely.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17 11:01:45 -08:00
Patrick Steinhardt ffad994138 refs: do not execute reference-transaction hook on packing refs
The reference-transaction hook is supposed to track logical changes to
references, but it currently also gets executed when packing refs in a
repository. This is unexpected and ultimately not all that useful:
packing refs is not supposed to result in any user-visible change to the
refs' state, and it ultimately is an implementation detail of how refs
stores work.

Fix this excessive execution of the hook when packing refs.

Reported-by: Waleed Khan <me@waleedkhan.name>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17 11:01:45 -08:00