"git diff --exit-code" ignored modified binary files, which has
been corrected.
* rs/diff-exit-code-binary:
diff: report modified binary files as changes in builtin_diff()
The reftable backend learned to more efficiently handle exclude
patterns while enumerating the refs.
* ps/reftable-exclude:
refs/reftable: wire up support for exclude patterns
reftable/reader: make table iterator reseekable
t/unit-tests: introduce reftable library
Makefile: stop listing test library objects twice
builtin/receive-pack: fix exclude patterns when announcing refs
refs: properly apply exclude patterns to namespaced refs
"git apply" had custom buffer management code that predated before
use of strbuf got widespread, which has been updated to use strbuf,
which also plugged some memory leaks.
* ps/apply-leakfix:
apply: refactor `struct image` to use a `struct strbuf`
apply: rename members that track line count and allocation length
apply: refactor code to drop `line_allocated`
apply: introduce macro and function to init images
apply: rename functions operating on `struct image`
apply: reorder functions to move image-related things together
When a remote-helper dies before Git writes to it, SIGPIPE killed
Git silently. We now explain the situation a bit better to the end
user in our error message.
* jk/diag-unexpected-remote-helper-death:
print an error when remote helpers die during capabilities
Code clean-up.
* ps/environ-wo-the-repository: (21 commits)
environment: stop storing "core.notesRef" globally
environment: stop storing "core.warnAmbiguousRefs" globally
environment: stop storing "core.preferSymlinkRefs" globally
environment: stop storing "core.logAllRefUpdates" globally
refs: stop modifying global `log_all_ref_updates` variable
branch: stop modifying `log_all_ref_updates` variable
repo-settings: track defaults close to `struct repo_settings`
repo-settings: split out declarations into a standalone header
environment: guard state depending on a repository
environment: reorder header to split out `the_repository`-free section
environment: move `set_git_dir()` and related into setup layer
environment: make `get_git_namespace()` self-contained
environment: move object database functions into object layer
config: make dependency on repo in `read_early_config()` explicit
config: document `read_early_config()` and `read_very_early_config()`
environment: make `get_git_work_tree()` accept a repository
environment: make `get_graft_file()` accept a repository
environment: make `get_index_file()` accept a repository
environment: make `get_object_directory()` accept a repository
environment: make `get_git_common_dir()` accept a repository
...
The diff machinery has two ways to detect changes to set the exit code:
Just comparing hashes and comparing blob contents. The latter is needed
if certain changes have to be ignored, e.g. with --ignore-space-change
or --ignore-matching-lines. It's enabled by the diff_options flag
diff_from_contents.
The code for handling binary files added by 1aaf69e669 (diff: shortcut
for diff'ing two binary SHA-1 objects, 2014-08-16) always uses a quick
hash-only comparison, even if the slow way is taken. We need it to
report a hash difference as a change for the purpose of setting the
exit code, though, but it never did. Fix that.
d7b97b7185 (diff: let external diffs report that changes are
uninteresting, 2024-06-09) set diff_from_contents if external diff
programs are allowed. This is the default e.g. for git diff, and so
that change exposed the inconsistency much more widely.
Reported-by: Kohei Shibata <shiba200712@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Git 2.39, Git.pm stopped working in a bare repository, which has
been corrected.
* jk/git-pm-bare-repo-fix:
Git.pm: use "rev-parse --absolute-git-dir" rather than perl code
Git.pm: fix bare repository search with Directory option
The support to customize build options to adjust for older versions
and/or older systems for the interop tests has been improved.
* jk/interop-test-build-options:
t/interop: allow per-version make options
"git rebase --autostash" failed to resurrect the autostashed
changes when the command gets aborted after giving back control
asking for hlep in conflict resolution.
* pw/rebase-autostash-fix:
rebase: apply and cleanup autostash when rebase fails to start
The error messages from the test script checker have been improved.
* es/chainlint-message-updates:
chainlint: reduce annotation noise-factor
chainlint: make error messages self-explanatory
chainlint: don't be fooled by "?!...?!" in test body
Import clar unit tests framework libgit2 folks invented for our
use.
* ps/clar-unit-test:
Makefile: rename clar-related variables to avoid confusion
clar: add CMake support
t/unit-tests: convert ctype tests to use clar
t/unit-tests: convert strvec tests to use clar
t/unit-tests: implement test driver
Makefile: wire up the clar unit testing framework
Makefile: do not use sparse on third-party sources
Makefile: make hdr-check depend on generated headers
Makefile: fix sparse dependency on GENERATED_H
clar: stop including `shellapi.h` unnecessarily
clar(win32): avoid compile error due to unused `fs_copy()`
clar: avoid compile error with mingw-w64
t/clar: fix compatibility with NonStop
t: import the clar unit testing framework
t: do not pass GIT_TEST_OPTS to unit tests with prove
The `struct image` uses a character array to track the pre- or postimage
of a patch operation. This has multiple downsides:
- It is somewhat hard to track memory ownership. In fact, we have
several memory leaks in git-apply(1) because we do not (and cannot
easily) free the buffer in all situations.
- We have to reinvent the wheel and manually implement a lot of
functionality that would already be provided by `struct strbuf`.
- We have to carefully track whether `update_pre_post_images()` can do
an in-place update of the postimage or whether it has to allocate a
new buffer for it.
This is all rather cumbersome, and especially `update_pre_post_images()`
is really hard to understand as a consequence even though what it is
doing is rather trivial.
Refactor the code to use a `struct strbuf` instead, addressing all of
the above. Like this we can easily perform in-place updates in all
situations, the logic to perform those updates becomes way simpler and
the lifetime of the buffer becomes a ton easier to track.
This refactoring also plugs some leaking buffers as a side effect.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bugfixes and leak plugging in "git for-each-ref --format=..." code
paths.
* jk/ref-filter-trailer-fixes:
ref-filter: fix leak with unterminated %(if) atoms
ref-filter: add ref_format_clear() function
ref-filter: fix leak when formatting %(push:remoteref)
ref-filter: fix leak with %(describe) arguments
ref-filter: fix leak of %(trailers) "argbuf"
ref-filter: store ref_trailer_buf data per-atom
ref-filter: drop useless cast in trailers_atom_parser()
ref-filter: strip signature when parsing tag trailers
ref-filter: avoid extra copies of payload/signature
t6300: drop newline from wrapped test title
Another reftable test migrated to the unit-test framework.
* cp/unit-test-reftable-stack:
t-reftable-stack: add test for stack iterators
t-reftable-stack: add test for non-default compaction factor
t-reftable-stack: use reftable_ref_record_equal() to compare ref records
t-reftable-stack: use Git's tempfile API instead of mkstemp()
t: harmonize t-reftable-stack.c with coding guidelines
t: move reftable/stack_test.c to the unit testing framework
Exclude patterns can be used by reference backends to skip over blocks
of references that are uninteresting to the caller. Reference backends
do not have to wire up support for them, and all callers are expected to
behave as if the backend didn't support them. In fact, the only backend
that supports exclude patterns right now is the "packed" backend.
Exclude patterns can be quite an important performance optimization in
repositories that have loads of references. The patterns are set up in
case "transfer.hideRefs" and friends are configured during a fetch, so
handling these patterns becomes important once there are lots of hidden
refs in a served repository.
Now that we have properly re-seekable reftable iterators we can also
wire up support for these patterns in the "reftable" backend. Doing so
is conceptually simple: once we hit a reference whose prefix matches the
current exclude pattern we re-seek the iterator to the first reference
that doesn't match the pattern anymore. This schema only works for
trivial patterns that do not have any globbing characters in them, but
this restriction also applies do the "packed" backend.
This makes t1419 work with the "reftable" backend with some slight
modifications. Of course it also speeds up listing of references with
hidden refs. The following benchmark prints one reference with 1 million
hidden references:
Benchmark 1: HEAD~
Time (mean ± σ): 93.3 ms ± 2.1 ms [User: 90.3 ms, System: 2.5 ms]
Range (min … max): 89.8 ms … 97.2 ms 33 runs
Benchmark 2: HEAD
Time (mean ± σ): 4.2 ms ± 0.6 ms [User: 2.2 ms, System: 1.8 ms]
Range (min … max): 3.1 ms … 8.1 ms 765 runs
Summary
HEAD ran
22.15 ± 3.19 times faster than HEAD~
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 67ce50ba26 (Merge branch 'ps/reftable-reusable-iterator', 2024-05-30)
we have refactored the interface of reftable iterators such that they
can be reused in theory. This patch series only landed the required
changes on the interface level, but didn't yet implement the actual
logic to make iterators reusable.
As it turns out almost all of the infrastructure already does support
re-seeking. The only exception is the table iterator, which does not
reset its `is_finished` bit. Do so and add a couple of tests that verify
that we can re-seek iterators.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have recently migrated all of the reftable unit tests that were part
of the reftable library into our own unit testing framework. As part of
that migration we have duplicated some of the functionality that was
part of the reftable test framework into each of the migrated test
suites. This was a sensible decision to not have all of the migrations
dependent on each other, but now that the migration is done it makes
sense to deduplicate the functionality again.
Introduce a new reftable test library that hosts some shared code and
adapt tests to use it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `write_head_info()` we announce references to the remote client. We
need to honor "transfer.hideRefs" here so that we do not announce any
references that the client shouldn't be able to learn about. This is
done via two separate mechanisms:
- We hand over exclude patterns to the reference backend. We can only
honor "plain" exclude patterns here that do not have prefixes with
special meaning such as "^" or "!". Filtering down the references is
handled by `hidden_refs_to_excludes()`.
- In `show_ref_cb()` we perform a second check against hidden refs.
For one this is done such that we can handle those special prefixes.
And second, handling exclude patterns in ref backends is optional,
so we also have to handle "normal" patterns.
The special-meaning "^" prefix alters whether a hidden ref applies to
the namespace-stripped reference name or the full name. So while we
would usually call `refs_for_each_namespaced_ref()` to only get those
references in the current namespace, we can't because we'd get the
already-rewritten reference names. Instead, we are forced to use
`refs_for_each_fullref_in()` and then manually strip away the namespace
prefix such that we have access to both names.
But this also means that we do not get namespace handling for exclude
patterns, which `refs_for_each_namespaced_ref()` brings for free. This
results in a bug because we potentially end up hiding away references
based on their namespaced name and not on the stripped name as we really
should be doing.
Fix this by manually rewriting the exclude patterns to their namespaced
variants.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reference namespaces allow commands like git-upload-pack(1) to serve
different sets of references to the client depending on which namespace
is enabled, which is for example useful in fork networks. Namespaced
refs are stored with a `refs/namespaces/$namespace` prefix, but all the
user will ultimately see is a stripped version where that prefix is
removed.
The way that this interacts with "transfer.hideRefs" is not immediately
obvious: the hidden refs can either apply to the stripped references, or
to the non-stripped ones that still have the namespace prefix. In fact,
the "transfer.hideRefs" machinery does the former and applies to the
stripped reference by default, but rules can have "^" prefixed to switch
this behaviour to instead match against the full reference name.
Namespaces are exclusively handled at the generic "refs" layer, the
respective backends have no clue that such a thing even exists. This
also has the consequence that they cannot handle hiding references as
soon as reference namespaces come into play because they neither know
whether a namespace is active, nor do they know how to strip references
if they are active.
Handling such exclude patterns in `refs_for_each_namespaced_ref()` and
`refs_for_each_fullref_in_prefixes()` is broken though, as both support
that the user passes both namespaces and exclude patterns. In the case
where both are set we will exclude references with unstripped names,
even though we really wanted to exclude references based on their
stripped names.
This only surfaces when:
- A repository uses reference namespaces.
- "transfer.hideRefs" is active.
- The namespaced references are packed into the "packed-refs" file.
None of our tests exercise this scenario, and thus we haven't ever hit
it. While t5509 exercises both (1) and (2), it does not happen to hit
(3). It is trivial to demonstrate the bug though by explicitly packing
refs in the tests, and then we indeed surface the breakage.
Fix this bug by prefixing exclude patterns with the namespace in the
generic layer. The newly introduced function will be used outside of
"refs.c" in the next patch, so we add a declaration to "refs.h".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the test "cc list is sanitized", we feed a commit with a variety of
trailers to send-email, and then check its output to see how it handled
them. For most of them, we are grepping for a specific mention of the
header, but there's a "BugID" header which we expect to be ignored. We
confirm this by grepping for "12345", the fake BugID, and making sure it
is not present.
But we can be fooled by false positives! I just tracked down a flaky
test failure here that was caused by matching this unrelated line in the
output:
<20240914090449.612345-1-author@example.com>
which will change from run to run based on the time, pid, etc.
Ideally we'd tighten the regex to make this more specifically, but since
the point is that it _shouldn't_ be mentioned, it's hard to say what the
right match would be (e.g., would there be a leading space?).
Instead, let's just choose a match that is much less likely to appear.
The actual content of the header isn't important, since it's supposed to
be ignored.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The transport-helper code generally relies on the
remote-helper to provide an informative message to the user
when it encounters an error. In the rare cases where the
helper does not do so, the output can be quite confusing.
E.g.:
$ git clone https://example.com/foo.git
Cloning into 'foo'...
$ echo $?
128
$ ls foo
/bin/ls: cannot access foo: No such file or directory
We tried to address this with 81d340d (transport-helper:
report errors properly, 2013-04-10).
But that makes the common case much more confusing. The
remote helper protocol's method for signaling normal errors
is to simply hang up. So when the helper does encounter a
routine error and prints something to stderr, the extra
error message is redundant and misleading. So we dropped it
again in 266f1fd (transport-helper: be quiet on read errors
from helpers, 2013-06-21).
This puts the uncommon case right back where it started. We
may be able to do a little better, though. It is common for
the helper to die during a "real" command, like fetching the
list of remote refs. It is not common for it to die during
the initial "capabilities" negotiation, right after we
start. Reporting failure here is likely to catch fundamental
problems that prevent the helper from running (and reporting
errors) at all. Anything after that is the responsibility of
the helper itself to report.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interpret-trailers command failed to recognise the end of the
message when the commit log ends in an incomplete line.
* bl/trailers-and-incomplete-last-line-fix:
interpret-trailers: handle message without trailing newline
In a few corner cases "git diff --exit-code" failed to report
"changes" (e.g., renamed without any content change), which has
been corrected.
* rs/diff-exit-code-fix:
diff: report dirty submodules as changes in builtin_diff()
diff: report copies and renames as changes in run_diff_cmd()
"git verify-pack" and "git index-pack" started dying outside a
repository, which has been corrected.
* ps/index-pack-outside-repo-fix:
builtin/index-pack: fix segfaults when running outside of a repo
The last test in t5512 we recently added seems to be flaky.
Running
$ make && cd t && sh ./t5512-ls-remote.sh --stress
shows that "git ls-remote foo::bar" exited with status 141, which
means we got a SIGPIPE. This test piece was introduced by 9e89dcb6
(builtin/ls-remote: fall back to SHA1 outside of a repo, 2024-08-02)
and is pretty much independent from all other tests in the script
(it can even run standalone with everything before it removed).
The transport-helper.c:get_helper() function tries to write to the
helper. As we can see the helper script is very short and can exit
even before it reads anything, when get_helper() tries to give the
first command, "capabilities", the helper may already be gone.
A trivial fix, presented here, is to make sure that the helper reads
the first command it is given, as what it writes later is a response
to that command.
I however would wonder if the interactions with the helper initiated
by get_helper() should be done on a non-blocking I/O (we do check
the return value from our write(2) system calls, do we?).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When opening a bare repository like:
Git->repository(Directory => '/path/to/bare.git');
we will incorrectly point the repository object at the _current_
directory, not the one specified by the option.
The bug was introduced by 20da61f25f (Git.pm: trust rev-parse to find
bare repositories, 2022-10-22). Before then, we'd ask "rev-parse
--git-dir" if it was a Git repo, and if it returned anything, we'd
correctly convert that result to an absolute path using File::Spec and
Cwd::abs_path(). If it didn't, we'd guess it might be a bare repository
and find it ourselves, which was wrong (rev-parse should find even a
bare repo, and our search circumvented some of its rules).
That commit dropped most of the custom bare-repo search code in favor of
using "rev-parse --is-bare-repository" and trusting the "--git-dir" it
returned. But it mistakenly left some of the bare-repo code path in
place, which was now broken. That code calls Cwd::abs_path($dir); prior
to 20da61f25f $dir contained the "Directory" option the user passed in.
But afterwards, it contains the output of "rev-parse --git-dir". And
since our tentative rev-parse command is invoked after changing
directory, it will always be the relative path "."! So we'll end up with
the absolute path of the process's current directory, not the Directory
option the caller asked for.
So the non-bare case is correct, but the bare one is broken. Our tests
only check the non-bare one, so we didn't notice. We can fix this by
running the same absolute-path fixup code for both sides.
Helped-by: Rodrigo <rodrigolive@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building older versions of Git may require tweaking some build knobs. In
particular, very old versions of Git will fail to build with recent
OpenSSL, because the bignum type switched from a struct to a pointer.
The i5500 interop test uses Git v1.0.0 by default, which triggers this
problem. You can work around it by setting NO_OPENSSL in your
GIT_TEST_MAKE_OPTS variable. But there are two downsides:
1. You have to know to do this, and it's not at all obvious.
2. That sets the options for _all_ versions of Git that we build. And
it's possible for two versions to require conflicting knobs. E.g.,
building with "make NO_OPENSSL=Nope OPENSSL_SHA1=Yes" causes
imap-send.c to barf, because it declares a fallback typedef for SSL.
This is something we may want to fix, but of course many historical
versions are affected, and the interop scripts should be flexible
enough to build everything.
So let's introduce per-version make options, along with the ability for
scripts to specify knobs that match their default versions. That should
make everything build out of the box, but also allow testers flexibility
if they are testing interoperability between non-default versions.
We'll set NO_OPENSSL by default for v1.0.0 in i5500. It doesn't have to
worry about the conflict with OPENSSL_SHA1 because imap-send did not
exist back then (but if it did, it could also just explicitly use a
different hash implementation).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git cat-file" works well with the sparse-index, and gets marked as
such.
* kl/cat-file-on-sparse-index:
builtin/cat-file: mark 'git cat-file' sparse-index compatible
t1092: allow run_on_* functions to use standard input
One-line messages to "die" and other helper functions will get LF
added by these helper functions, but many existing messages had an
unnecessary LF at the end, which have been corrected.
* jk/messages-with-excess-lf-fix:
drop trailing newline from warning/error/die messages
"git pack-refs --auto" for the files backend was too aggressive,
which has been a bit tamed.
* ps/pack-refs-auto-heuristics:
refs/files: use heuristic to decide whether to repack with `--auto`
t0601: merge tests for auto-packing of refs
wrapper: introduce `log2u()`
A data corruption bug when multi-pack-index is used and the same
objects are stored in multiple packfiles has been corrected.
* tb/multi-pack-reuse-fix:
builtin/pack-objects.c: do not open-code `MAX_PACK_OBJECT_HEADER`
pack-bitmap.c: avoid repeated `pack_pos_to_offset()` during reuse
builtin/pack-objects.c: translate bit positions during pack-reuse
pack-bitmap: tag bitmapped packs with their corresponding MIDX
t/t5332-multi-pack-reuse.sh: verify pack generation with --strict
"git verify-pack" and "git index-pack" started dying outside a
repository, which has been corrected.
* ps/index-pack-outside-repo-fix:
builtin/index-pack: fix segfaults when running outside of a repo
"git bundle unbundle" outside a repository triggered a BUG()
unnecessarily, which has been corrected.
* ps/bundle-outside-repo-fix:
bundle: default to SHA1 when reading bundle headers
builtin/bundle: have unbundle check for repo before opening its bundle
The patch parser in "git patch-id" has been tightened to avoid
getting confused by lines that look like a patch header in the log
message.
cf. <Zqh2T_2RLt0SeKF7@tanuki>
* jc/patch-id:
patch-id: tighten code to detect the patch header
patch-id: rewrite code that detects the beginning of a patch
patch-id: make get_one_patchid() more extensible
patch-id: call flush_current_id() only when needed
t4204: patch-id supports various input format
In "environment.h" we have quite a lot of functions and variables that
either explicitly or implicitly depend on `the_repository`.
The implicit set of stateful declarations includes for example variables
which get populated when parsing a repository's Git configuration. This
set of variables is broken by design, as their state often depends on
the last repository config that has been parsed. So they may or may not
represent the state of `the_repository`.
Fixing that is quite a big undertaking, and later patches in this series
will demonstrate a solution for a first small set of those variables. So
for now, let's guard these with `USE_THE_REPOSITORY_VARIABLE` so that
callers are aware of the implicit dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `read_early_config()` function can be used to read configuration
where a repository has not yet been set up. As such, it is optional
whether or not `the_repository` has already been initialized. If it was
initialized we use its commondir and gitdir. If not, the function will
try to detect the Git directories by itself and, if found, also parse
their config files.
This means that we implicitly rely on `the_repository`. Make this
dependency explicit by passing a `struct repository`. This allows us to
again drop the `USE_THE_REPOSITORY_VARIABLE` define in "config.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After building Git with NO_LIBCURL, we're lacking `git remote-http` and
`git http-fetch`, so when we test that they trace as they should, we're
bound to fail. Add the LIBCURL prereq to those tests.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After building Git with NO_LIBCURL, there is no `git remote-http`, so
it's not meaningful to test that it can run outside of a repository.
Indeed, that test will fail. Add the LIBCURL prereq to it.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>