When writing a commit-graph, we use the chunk-format API to write out
each individual chunk of the commit-graph. Each chunk of the
commit-graph is tracked via a call to `add_chunk()`, along with the
expected size of that chunk.
Similar to an earlier commit which handled the identical issue in the
MIDX machinery, guard against overflow when dealing with a commit-graph
with a large number of entries to avoid corrupting the contents of the
commit-graph itself.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a bitmap is used to answer some reachability query, it creates a
pseudo-bitmap called the "extended index" on top of any existing bitmaps
to store objects that are relevant to the query, but not mentioned in
the bitmap.
When looking up the ith object in the extended index in a bitmap, it is
common to write something like:
bitmap_get(result, i + bitmap_num_objects(bitmap_git))
, indicating that we want the ith object following all other objects
mentioned in the bitmap_git.
Since the type of `i` and the return type of `bitmap_num_objects()` are
both `uint32_t`s, But if there are either a large number of objects in
the bitmap, or a large number of objects in the extended index (or
both), this addition can overflow when the sum is greater than 2^32-1.
Having that large of a bitmap position is entirely acceptable, but we
need to ensure that the computed bitmap position for that object is
performed using 64-bits and doesn't overflow.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as in previous commits, avoid an integer overflow
when computing the expected size of a MIDX.
(Note that this is also OK as-is, since `p->pack_size` is an `off_t`, so
this computation should already be done as 64-bit integers. But again,
let's use `st_mult()` to make this fact clear).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a MIDX, we use the chunk-format API to write out each
individual chunk of the MIDX. Each chunk of the MIDX is tracked via a
call to `add_chunk()`, along with the expected size of that chunk.
Guard against overflow when dealing with a MIDX with a large number of
entries (and consequently, large chunks within the MIDX file itself) to
avoid corrupting the contents of the MIDX itself.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the `write_midx_context` structure, we use two `uint32_t`'s to track
the length and allocated size of the packs, and one `uint32_t` to track
the number of objects in the MIDX.
In practice, having these be 32-bit unsigned values shouldn't cause any
problems since we are unlikely to have that many objects or packs in any
real-world repository. But these values should be `size_t`'s, so change
their type to reflect that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as previous patches, avoid an overflow when looking
up object offsets in the MIDX's large offset table by guarding the
computation via `st_mult()`.
This instance is also OK as-is, since the left operand is the result of
`sizeof(...)`, which is already a `size_t`. But use `st_mult()` instead
here to make it explicit that this computation is to be performed using
64-bit unsigned integers.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as previous commits, avoid overflow when looking up
an object's OID in a MIDX when its position is greater than
`2^32-1/m->hash_len`.
As usual, it is perfectly OK for a MIDX to have as many as 2^32-1
objects (since we use 32-bit fields to count the number of objects at
each fanout layer). But if we have more than `2^32-1/m->hash_len` number
of objects, we will incorrectly perform the computation using 32-bit
integers, overflowing the result.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `midx_fanout` struct is used to keep track of a set of OIDs
corresponding to each layer of the MIDX's fanout table. It stores an
array of entries, along with the number of entries in the table, and the
allocated size of the array.
Both `nr` and `alloc` are stored as 32-bit unsigned integers. In
practice, this should never cause any problems, since most packs have
far fewer than 2^32-1 objects.
But storing these as `size_t`'s is more appropriate, and prevents us
from accidentally overflowing some result when multiplying or adding to
either of these values. Update these struct members to be `size_t`'s as
appropriate.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as the previous commits, ensure that we use
`st_add()` or `st_mult()` when computing values that may overflow the
32-bit unsigned limit.
Note that in each of these instances, we prevent 32-bit overflow
already since we have explicit casts to `size_t`.
So this code is OK as-is, but let's clarify it by using the `st_xyz()`
helpers to make it obvious that we are performing the relevant
computations using 64 bits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prevent an overflow when locating a pack's CRC offset when the number
of packed items is greater than 2^32-1/hashsz by guarding the
computation with an `st_mult()`.
Note that to avoid truncating the result, the `crc_offset` member must
itself become a `size_t`. The only usage of this variable (besides the
assignment in `load_idx()`) is in `read_v2_anomalous_offsets()` in the
index-pack code. There we use the `crc_offset` as a pointer offset, so
we are already equipped to handle the type change.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many callback interfaces have an extra void data parameter, but we don't
always need it (especially for dumping functions like the ones in test
helpers). Mark them as unused to avoid -Wunused-parameter warnings.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We iterate over the set of input tag names using callbacks. But not all
operations need the same inputs, so some parameters go unused (but of
course not the same ones for each operation). Mark the unused ones to
avoid -Wunused-parameter warnings.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't need to use the "data" parameter in this instance. Let's mark
it to avoid -Wunused-parameter warnings.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't look at the "commit" parameter to our callback, as our
"mergetag_data" pointer contains the original name "ref", which we use
instead. But we can't get rid of it, since other for_each_mergetag
callbacks do use it. Let's mark the parameter to avoid
-Wunused-parameter warnings.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't look at the "flags" parameter, which is natural for something
that is just printing the contents of the replace refs. But let's mark
it to appease -Wunused-parameter.
This probably should have been part of 63e14ee2d6 (refs: mark unused
each_ref_fn parameters, 2022-08-19), but I missed it as this one is a
repo_each_ref_fn, which takes an extra repository argument.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our threeway_callback() does not bother to look at its "n" parameter. It
is static in this file and used only by trivial_merge_trees(), which
always passes 3 trees (hence the name "threeway"). It also does not look
at "dirmask". This is OK, as it handles directories specifically by
looking at the mode bits.
Other traverse_info callbacks need these, so we can't get drop them from
the interface. But let's annotate these ones to avoid complaints from
-Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few callback functions which are used with the fsck code,
but it's natural that not all callbacks need all parameters. For
reporting, even something as obvious as "the oid of the object which had
a problem" is not always used, as some callers are only checking a
single object in the first place. And for both reporting and walking,
things like void data pointers and the fsck_options aren't always
necessary.
But since each such parameter is used by _some_ callback, we have to
keep them in the interface. Mark the unused ones in specific callbacks
to avoid triggering -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The setup_revision_opt struct has a "tweak" function pointer, which can
be used to adjust parameters after setup_revisions() parses arguments,
but before it finalizes setup. In addition to the rev_info struct, the
callback receives a pointer to the setup_revision_opt, as well.
But none of the existing callbacks looks at the extra "opt" parameter,
leading to -Wunused-parameter warnings.
We could mark it as UNUSED, but instead let's remove it entirely. It's
conceivable that it could be useful for a callback to have access to the
"opt" struct. But in the 13 years that this mechanism has existed,
nobody has used it. So let's just drop it in the name of simplifying.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callbacks to for_each_altodb() get a void data pointer, but we don't
need it here. Mark it as unused to silence -Wunused-parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing the input, we have a "keep_cr" parameter to tell us how to
handle line endings. But this doesn't apply to stgit or hg patches
(which are not mailbox formats where we have to worry about that), so we
ignore the parameter entirely in those functions.
Let's mark these as unused so that -Wunused-parameter does not complain
about them.
Note that we could just drop these parameters entirely. They are
necessary to conform to the mail_conv_fn interface used by
split_mail_conv(), but these two callbacks are the only ones used with
that function. The other formats (which _do_ care about keep_cr) use
split_mail_mbox(). But it's conceivable that we'd eventually add another
format that does care about this option, so let's leave it as part of
the generic interface.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The xml_start_tag() function is passed the expat library's
XML_SetElementHandler() function, so it has to conform to the
expected interface. But we don't actually care about the attributes
list. Mark it so that -Wunused-parameter does not complain.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These functions are all used as callbacks for curl, so they have to
conform to a particular interface. But they don't need all of their
parameters:
- fwrite_null() throws away the input, so it doesn't look at most
parameters
- fwrite_wwwauth() in theory could take the auth struct in its void
pointer, but instead we just access it as the global http_auth
(matching the rest of the code in this file)
- curl_trace() always writes via the trace mechanism, so it doesn't
need its void pointer to know where to send things. Likewise, it
ignores the CURL parameter, since nothing we trace requires querying
the handle.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function gets a repository parameter because it's a callback for
do_for_each_repo_ref_iterator(). But it's just a wrapper that passes
along each call to a regular each_ref_fn callback, and the latter
doesn't accept a repository argument.
Probably in the long run all of the each_ref_fn callbacks should get a
repository parameter, too. But changing that now would require updates
all over the code base. Until that happens, let's annotate this wrapper
callback to quiet the compiler's -Wunused-parameter warning.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reflog-expire command has been unimplemented since it was added in
80f2a6097c (t/helper: add test-ref-store to test ref-store functions,
2017-03-26). This causes -Wunused-parameter to complain, since the
function just calls die() without looking at its arguments.
We could mark these as UNUSED to silence the warning. But let's just
drop the function. It has no callers in the test suite and is not doing
anything useful, beyond perhaps reminding us that it's something we
_could_ be testing.
But since the bulk of the work in adding such tests would be the shell
bits that actually examine the reflog state before and after expiration,
this is not even a useful step in that direction. Somebody who wants to
do that work later can easily add this function back.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two messages were introduced in 8ba221e245 (bundle: output hash
information in 'verify', 2022-03-22) and 105c6f14ad (bundle: parse
filter capability, 2022-03-09) but never for translation.
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a narrow but common case, the user is the only author of a branch and
doesn't mind overwriting the corresponding branch on the remote. This
workflow is especially common on GitHub, GitLab, and Gerrit, which keep
a permanent record of every version of a branch that is pushed while a
pull request is open for that branch. On those platforms, force-pushing
is encouraged and is analogous to emailing a new version of a patchset.
When giving advice about divergent branches, tell the user about
`git pull`, but don't unconditionally instruct the user to do it. A less
prescriptive message will help prevent users from thinking that they are
required to create an integrated history instead of simply replacing the
previous history. Also, don't put `git pull` in an awkward
parenthetical, because `git pull` can always be used to reconcile
branches and is the normal way to do so.
Due to the difficulty of knowing which command for force-pushing is best
suited to the user's situation, no specific advice is given about
force-pushing. Instead, the user is directed to the Git documentation to
read about possible ways forward that do not involve integration.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a narrow but common case, the user is the only author of a branch and
doesn't mind overwriting the corresponding branch on the remote. This
workflow is especially common on GitHub, GitLab, and Gerrit, which keep
a permanent record of every version of a branch that is pushed while a
pull request is open for that branch. On those platforms, force-pushing
is encouraged and is analogous to emailing a new version of a patchset.
When giving advice about divergent branches, tell the user about
`git pull`, but don't unconditionally instruct the user to do it. A less
prescriptive message will help prevent users from thinking that they are
required to create an integrated history instead of simply replacing the
previous history. Likewise, don't imply that `git pull` is only for
merging.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the user is in the middle of making a commit, they are not yet at
the point where they are ready to think about integrating their local
branch with the corresponding remote branch or force-pushing over the
remote branch. Don't include advice on how to deal with divergent
branches in the commit template, to avoid giving the impression that the
divergence needs to be dealt with immediately. Similar advice will be
printed when it is most relevant, that is, if the user does try to push
without first reconciling the two branches.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 37fec86a83 (packfile: abstract away hash constant values,
2018-05-02), `nth_packed_object_id()` started using the variable
`the_hash_algo->rawsz` instead of a fixed constant when trying to
compute an offset into the ".idx" file for some object position.
This can lead to surprising truncation when looking for an object
towards the end of a large enough pack, like the following:
(gdb) p hashsz
$1 = 20
(gdb) p n
$2 = 215043814
(gdb) p hashsz * n
$3 = 5908984
, which is a debugger session broken on a known-bad call to the
`nth_packed_object_id()` function.
This behavior predates 37fec86a83, and is original to the v2 index
format, via: 74e34e1fca (sha1_file.c: learn about index version 2,
2007-04-09).
This is due to §6.4.4.1 of the C99 standard, which states that an
untyped integer constant will take the first type in which the value can
be accurately represented, among `int`, `long int`, and `long long int`.
Since 20 can be represented as an `int`, and `n` is a 32-bit unsigned
integer, the resulting computation is defined by §6.3.1.8, and the
(signed) integer value representing `n` is converted to an unsigned
type, meaning that `20 * n` (for `n` having type `uint32_t`) is
equivalent to a multiplication between two unsigned 32-bit integers.
When multiplying a sufficiently large `n`, the resulting value can
exceed 2^32-1, wrapping around and producing an invalid result. Let's
follow the example in f86f769550 (compute pack .idx byte offsets using
size_t, 2020-11-13) and replace this computation with `st_mult()`, which
will ensure that the computation is done using 64-bits.
While here, guard the corresponding computation for packs with v1
indexes, too. Though the likelihood of seeing a bug there is much
smaller, since (a) v1 indexes are generated far less frequently than v2
indexes, and (b) they all correspond to packs no larger than 2 GiB, so
having enough objects to trigger this overflow is unlikely if not
impossible.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When repacking, the function `collect_pack_filenames()` is responsible
for collecting the set of existing packs in the repository, and
partitioning them into "kept" (if the pack has a ".keep" file or was
given via `--keep-pack`) and "nonkept" (otherwise) lists.
This function comes from the original C port of git-repack.sh from back
in a1bbc6c017 (repack: rewrite the shell script in C, 2013-09-15),
where it first appears as `get_non_kept_pack_filenames()`. At the time,
the implementation was a fairly direct translation from the relevant
portion of git-repack.sh, which looped over the results of
find "$PACKDIR" -type f -name '*.pack'
either ignoring the pack as kept, or adding it to the list of existing
packs.
So the choice to directly translate this function in terms of
`readdir()` in a1bbc6c017 made sense. At the time, it was possible to
refine the C version in terms of packed_git structs, but was never done.
However, manually enumerating a repository's packs via `readdir()` is
confusing and error-prone. It leads to frustrating inconsistencies
between which packs Git considers to be part of a repository (i.e.,
could be found in the list of packs from `get_all_packs()`), and which
packs `collect_pack_filenames()` considers to meet the same criteria.
This bit us in 73320e49ad (builtin/repack.c: only collect fully-formed
packs, 2023-06-07), and again in the previous commit.
Prevent these issues from biting us in the future by implementing the
`collect_pack_filenames()` function by looping over an array of pointers
to `packed_git` structs, ensuring that we use the same criteria to
determine the set of available packs.
One gotcha here is that we have to ignore non-local packs, since the
original version of `collect_pack_filenames()` only looks at the local
pack directory to collect existing packs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 73320e49ad (builtin/repack.c: only collect fully-formed packs,
2023-06-07), we switched the check for which packs to collect by
starting at the .idx files and looking for matching .pack files. This
avoids trying to repack pack-files that have not had their pack-indexes
installed yet.
However, it does cause maintenance to halt if we find the (problematic,
but not insurmountable) case of a .idx file without a corresponding
.pack file. In an environment where packfile maintenance is a critical
function, such a hard stop is costly and requires human intervention to
resolve (by deleting the .idx file).
This was not the case before. We successfully repacked through this
scenario until the recent change to scan for .idx files.
Further, if we are actually in a case where objects are missing, we
detect this at a different point during the reachability walk.
In other cases, Git prepares its list of packfiles by scanning .idx
files and then only adds it to the packfile list if the corresponding
.pack file exists. It even does so without a warning! (See
add_packed_git() in packfile.c for details.)
This case is much less likely to occur than the failures seen before
73320e49ad. Packfiles are "installed" by writing the .pack file before
the .idx and that process can be interrupted. Packfiles _should_ be
deleted by deleting the .idx first, followed by the .pack file, but
unlink_pack_path() does not do this: it deletes the .pack _first_,
allowing a window where this process could be interrupted. We leave the
consideration of changing this order as a separate concern. Knowing that
this condition is possible from interrupted Git processes and not other
tools lends some weight that Git should be more flexible around this
scenario.
Add a check to see if the .pack file exists before adding it to the list
for repacking. This will stop a number of maintenance failures seen in
production but fixed by deleting the .idx files.
This brings us closer to the case before 73320e49ad in that 'git
repack' will not fail when there is an orphaned .idx file, at least, not
due to the way we scan for packfiles. In the case that the .pack file
was erroneously deleted without copies of its objects in other installed
packfiles, then 'git repack' will fail due to the reachable object walk.
This does resolve the case where automated repacks will no longer be
halted on this case. The tests in t7700 show both these successful
scenarios and the case of failing if the .pack was truly required.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as in previous commits, teach `ls-refs` to avoid
enumerating hidden references where possible.
As before, this is linux.git with one hidden reference per commit.
$ hyperfine -L v ,.compile 'git{v} -c protocol.version=2 ls-remote .'
Benchmark 1: git -c protocol.version=2 ls-remote .
Time (mean ± σ): 89.8 ms ± 0.6 ms [User: 84.3 ms, System: 5.7 ms]
Range (min … max): 88.8 ms … 91.3 ms 32 runs
Benchmark 2: git.compile -c protocol.version=2 ls-remote .
Time (mean ± σ): 6.5 ms ± 0.1 ms [User: 2.4 ms, System: 4.3 ms]
Range (min … max): 6.2 ms … 8.3 ms 397 runs
Summary
'git.compile -c protocol.version=2 ls-remote .' ran
13.85 ± 0.33 times faster than 'git -c protocol.version=2 ls-remote .'
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as a previous commit, teach `upload-pack` to avoid
enumerating hidden references where possible.
Note, however, that there are certain cases where cannot avoid
enumerating even hidden references, in particular when either of:
- `uploadpack.allowTipSHA1InWant`, or
- `uploadpack.allowReachableSHA1InWant`
are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`,
respectively.
When either of these bits are set, upload-pack's `is_our_ref()` function
needs to consider the `HIDDEN_REF` bit of the referent's object flags.
So we must visit all references, including the hidden ones, in order to
mark their referents with the `HIDDEN_REF` bit.
When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the
`is_our_ref()` function considers only the `OUR_REF` bit, and not the
`HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only
to objects at the tips of non-hidden references, so we do not need to
visit hidden references in this case.
When neither of those bits are set, `upload-pack` can potentially avoid
enumerating a large number of references. In the same example as a
previous commit (linux.git with one hidden reference per commit,
"refs/pull/N"):
$ printf 0000 >in
$ hyperfine --warmup=1 \
'git -c transfer.hideRefs=refs/pull upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in'
Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 406.9 ms ± 1.1 ms [User: 357.3 ms, System: 49.5 ms]
Range (min … max): 405.7 ms … 409.2 ms 10 runs
Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in
Time (mean ± σ): 406.5 ms ± 1.3 ms [User: 356.5 ms, System: 49.9 ms]
Range (min … max): 404.6 ms … 408.8 ms 10 runs
Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in
Time (mean ± σ): 4.7 ms ± 0.2 ms [User: 0.7 ms, System: 3.9 ms]
Range (min … max): 4.3 ms … 6.1 ms 472 runs
Summary
'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran
86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in'
86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in'
As above, we must visit every reference when
uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit
far fewer references.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that `refs_for_each_fullref_in()` has the ability to avoid
enumerating references matching certain pattern(s), use that to avoid
visiting hidden refs when constructing the ref advertisement via
receive-pack.
Note that since this exclusion is best-effort, we still need
`show_ref_cb()` to check whether or not each reference is hidden or not
before including it in the advertisement.
As was the case when applying this same optimization to `upload-pack`,
`receive-pack`'s reference advertisement phase can proceed much quicker
by avoiding enumerating references that will not be part of the
advertisement.
(Below, we're still using linux.git with one hidden refs/pull/N ref per
commit):
$ hyperfine -L v ,.compile 'git{v} -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'
Benchmark 1: git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
Time (mean ± σ): 89.1 ms ± 1.7 ms [User: 82.0 ms, System: 7.0 ms]
Range (min … max): 87.7 ms … 95.5 ms 31 runs
Benchmark 2: git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git
Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 0.5 ms, System: 3.9 ms]
Range (min … max): 4.1 ms … 5.6 ms 508 runs
Summary
'git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git' ran
20.00 ± 1.05 times faster than 'git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git'
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent commits, we'll teach `receive-pack` and `upload-pack` to
use the new jump list feature in the packed-refs iterator by ignoring
references which are mentioned via its respective hideRefs lists.
However, the packed-ref jump lists cannot handle un-hiding rules (that
begin with '!'), or namespace comparisons (that begin with '^'). Add a
convenience function to the refs.h API to detect when either of these
conditions are met, and returns an appropriate value to pass as excluded
patterns.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future commit will want to call `for_each_namespaced_ref()` with
a list of excluded patterns.
We could introduce a variant of that function, say,
`for_each_namespaced_ref_exclude()` which takes the extra parameter, and
reimplement the original function in terms of that. But all but one
caller (in `http-backend.c`) will supply the new parameter, so add the
new parameter to `for_each_namespaced_ref()` itself instead of
introducing a new function.
For now, supply NULL for the list of excluded patterns at all callers to
avoid changing behavior, which we will do in a future change.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent commits, it will be convenient to have a 'const char **'
of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`,
etc.), instead of a `string_list`.
Convert spots throughout the tree that store the list of hidden refs
from a `string_list` to a `strvec`.
Note that in `parse_hide_refs_config()` there is an ugly const-cast used
to avoid an extra copy of each value before trimming any trailing slash
characters. This could instead be written as:
ref = xstrdup(value);
len = strlen(ref);
while (len && ref[len - 1] == '/')
ref[--len] = '\0';
strvec_push(hide_refs, ref);
free(ref);
but the double-copy (once when calling `xstrdup()`, and another via
`strvec_push()`) is wasteful.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit added low-level tests to ensure that the packed-refs
iterator did not enumerate excluded sections of the refspace.
However, there was no guarantee that these sections weren't being
visited, only that they were being suppressed from the output. To harden
these tests, add a trace2 counter which tracks the number of regions
skipped by the packed-refs iterator, and assert on its value.
Suggested-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When iterating through the `packed-refs` file in order to answer a query
like:
$ git for-each-ref --exclude=refs/__hidden__
it would be useful to avoid walking over all of the entries in
`refs/__hidden__/*` when possible, since we know that the ref-filter
code is going to throw them away anyways.
In certain circumstances, doing so is possible. The algorithm for doing
so is as follows:
- For each excluded pattern, find the first record that matches it,
and the first record that *doesn't* match it (i.e. the location
you'd next want to consider when excluding that pattern).
- Sort the set of excluded regions from the previous step in ascending
order of the first location within the `packed-refs` file that
matches.
- Clean up the results from the previous step: discard empty regions,
and combine adjacent regions. The set of regions which remains is
referred to as the "jump list", and never contains any references
which should be included in the result set.
Then when iterating through the `packed-refs` file, if `iter->pos` is
ever contained in one of the regions from the previous steps, advance
`iter->pos` past the end of that region, and continue enumeration.
Note that we only perform this optimization when none of the excluded
pattern(s) have special meta-characters in them. For a pattern like
"refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and
everything underneath them) are not connected. A future implementation
that handles this case may split the character class (pretending as if
two patterns were excluded: "refs/fooa", and "refs/fooc").
There are a few other gotchas worth considering. First, note that the
jump list is sorted, so once we jump past a region, we can avoid
considering it (or any regions preceding it) again. The member
`jump_pos` is used to track the first next-possible region to jump
through.
Second, note that the jump list is best-effort, since we do not handle
loose references, and because of the meta-character issue above. The
jump list may not skip past all references which won't appear in the
results, but will never skip over a reference which does appear in the
result set.
In repositories with a large number of hidden references, the speed-up
can be significant. Tests here are done with a copy of linux.git with a
reference "refs/pull/N" pointing at every commit, as in:
$ git rev-list HEAD | awk '{ print "create refs/pull/" NR " " $0 }' |
git update-ref --stdin
$ git pack-refs --all
, it is significantly faster to have `for-each-ref` jump over the
excluded references, as opposed to filtering them out after the fact:
$ hyperfine \
'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"' \
'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' \
'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"
Time (mean ± σ): 798.1 ms ± 3.3 ms [User: 687.6 ms, System: 146.4 ms]
Range (min … max): 794.5 ms … 805.5 ms 10 runs
Benchmark 2: git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
Time (mean ± σ): 98.9 ms ± 1.4 ms [User: 93.1 ms, System: 5.7 ms]
Range (min … max): 97.0 ms … 104.0 ms 29 runs
Benchmark 3: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"
Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 0.7 ms, System: 3.8 ms]
Range (min … max): 4.1 ms … 5.8 ms 524 runs
Summary
'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran
21.87 ± 1.05 times faster than 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"'
176.52 ± 8.19 times faster than 'git for-each-ref --format="%(objectname) %(refname)" | grep -vE "^[0-9a-f]{40} refs/pull/"'
(Comparing stock git and this patch isn't quite fair, since an earlier
commit in this series adds a naive implementation of the `--exclude`
option. `git.prev` is built from the previous commit and includes this
naive implementation).
Using the jump list is fairly straightforward (see the changes to
`refs/packed-backend.c::next_record()`), but constructing the list is
not. To ensure that the construction is correct, add a new suite of
tests in t1419 covering various corner cases (overlapping regions,
partially overlapping regions, adjacent regions, etc.).
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `find_reference_location()` is used to perform a
binary search-like function over the contents of a repository's
`$GIT_DIR/packed-refs` file.
The search it implements is unlike a standard binary search in that the
records it searches over are not of a fixed width, so the comparison
must locate the end of a record before comparing it.
Extract the core routine of `find_reference_location()` in order to
implement a function in the following patch which will find the first
location in the `packed-refs` file that *doesn't* match the given
pattern.
The behavior of `find_reference_location()` is unchanged.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The subsequent patch will want to access an optional `excluded_patterns`
array within `refs/packed-backend.c` that will cull out certain
references matching any of the given patterns on a best-effort basis.
To do so, the refs subsystem needs to be updated to pass this value
across a number of different locations.
Prepare for a future patch by introducing this plumbing now, passing
NULLs at top-level APIs in order to make that patch less noisy and more
easily readable.
Signed-off-by: Taylor Blau <me@ttaylorr.co>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using `for-each-ref`, it is sometimes convenient for the caller to
be able to exclude certain parts of the references.
For example, if there are many `refs/__hidden__/*` references, the
caller may want to emit all references *except* the hidden ones.
Currently, the only way to do this is to post-process the output, like:
$ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/'
Which is do-able, but requires processing a potentially large quantity
of references.
Teach `git for-each-ref` a new `--exclude=<pattern>` option, which
excludes references from the results if they match one or more excluded
patterns.
This patch provides a naive implementation where the `ref_filter` still
sees all references (including ones that it will discard) and is left to
check whether each reference matches any excluded pattern(s) before
emitting them.
By culling out references we know the caller doesn't care about, we can
avoid allocating memory for their storage, as well as spending time
sorting the output (among other things). Even the naive implementation
provides a significant speed-up on a modified copy of linux.git (that
has a hidden ref pointing at each commit):
$ hyperfine \
'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \
'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/'
Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"
Time (mean ± σ): 820.1 ms ± 2.0 ms [User: 703.7 ms, System: 152.0 ms]
Range (min … max): 817.7 ms … 823.3 ms 10 runs
Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/
Time (mean ± σ): 106.6 ms ± 1.1 ms [User: 99.4 ms, System: 7.1 ms]
Range (min … max): 104.7 ms … 109.1 ms 27 runs
Summary
'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran
7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"'
Subsequent patches will improve on this by avoiding visiting excluded
sections of the `packed-refs` file in certain cases.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`match_pattern()` and `match_name_as_path()` both take a `struct
ref_filter *`, and then store a stack variable `patterns` pointing at
`filter->patterns`.
The subsequent patch will add a new array of patterns to match over (the
excluded patterns, via a new `git for-each-ref --exclude` option),
treating the return value of these functions differently depending on
which patterns are being used to match.
Tweak `match_pattern()` and `match_name_as_path()` to take an array of
patterns to prepare for passing either in.
Once we start passing either in, `match_pattern()` will have little to
do with a particular `struct ref_filter *` instance. To clarify this,
drop it from the argument list, and replace it with the only bit of the
`ref_filter` that we care about (`filter->ignore_case`).
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We did not bother to clean up at all in `git branch` or `git tag`, and
`git for-each-ref` only cleans up a couple of members.
Add and call `ref_filter_clear()` when cleaning up a `struct
ref_filter`. Running this patch (without any test changes) indicates a
couple of now leak-free tests. This was found by running:
$ make SANITIZE=leak
$ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate
(Note that the `reachable_from` and `unreachable_from` lists should be
cleaned as they are used. So this is just covering any case where we
might bail before running the reachability check.)
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `reach_filter()`, we pop all commits from the reachable lists,
leaving them empty. But because we're operating on a list pointer that
was passed by value, the original `filter.reachable_from` and
`filter.unreachable_from` pointers are left dangling.
As is the case with the previous commit, nobody touches either of these
fields after calling `reach_filter()`, so leaving them dangling is OK.
But this future proofs against dangerous situations.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide a sane initialization value for `struct ref_filter`, which in a
subsequent patch will be used to initialize a new field.
In the meantime, ensure that the `ref_filter` struct used in the
test-helper's `cmd__reach()` is zero-initialized. The lack of
initialization is OK, since `commit_contains()` only looks at the single
`with_commit_tag_algo` field that *is* initialized directly above.
So this does not fix a bug, but rather prevents one from biting us in
the future.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The refs machinery has its own implementation of a `ref_filter` (used by
`for-each-ref`), which is distinct from the `ref-filter.h` API (also
used by `for-each-ref`, among other things).
Rename the one within refs.c to more clearly indicate its purpose.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Link to community list of credential helpers. This is useful information
for users.
Describe how OAuth credential helpers work. OAuth is a user-friendly
alternative to personal access tokens and SSH keys. Reduced setup cost
makes it easier for users to contribute to projects across multiple
forges.
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `git commit-graph verify` was taught how to verify commit-graph
chains in 3da4b609bb (commit-graph: verify chains with --shallow mode,
2019-06-18), it produced one line of progress per layer of the
commit-graph chain.
$ git.compile commit-graph verify
Verifying commits in commit graph: 100% (4356/4356), done.
Verifying commits in commit graph: 100% (131912/131912), done.
This could be somewhat confusing to users, who may wonder why there are
multiple occurrences of "Verifying commits in commit graph".
There are likely good arguments on whether or not there should be
one line of progress output per commit-graph layer. On the one hand, the
existing output shows us verifying each individual layer of the chain.
But on the other hand, the fact that a commit-graph may be stored among
multiple layers is an implementation detail that the caller need not be
aware of.
Clarify this by showing a single progress meter regardless of the number
of layers in the commit-graph chain. After this patch, the output
reflects the logical contents of a commit-graph chain, instead of
showing one line of output per commit-graph layer:
$ git.compile commit-graph verify
Verifying commits in commit graph: 100% (136268/136268), done.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the final step to prepare for consolidating the output of `git
commit-graph verify`. Instead of having each call to
`verify_one_commit_graph()` initialize its own progress struct, have the
caller pass one in instead.
This patch does not alter the output of `git commit-graph verify`, but
the next commit will consolidate the output.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>