The git-diff(1) command can be used outside repositories to diff two
files with each other. But even if there is no repository we will end up
hashing the files that we are diffing so that we can print the "index"
line:
```
diff --git a/a b/b
index 7898192..6178079 100644
--- a/a
+++ b/b
@@ -1 +1 @@
-a
+b
```
We implicitly use SHA1 to calculate the hash here, which is because
`the_repository` gets initialized with SHA1 during the startup routine.
We are about to stop doing this though such that `the_repository` only
ever has a hash function when it was properly initialized via a repo's
configuration.
To give full control to our users, we would ideally add a new switch to
git-diff(1) that allows them to specify the hash function when executed
outside of a repository. But for now, we only convert the code to make
this explicit such that we can stop setting the default hash algorithm
for `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Verifying a bundle requires us to have a repository. This is encoded in
`verify_bundle()`, which will return an error if there is no repository.
We call `open_bundle()` before we call `verify_bundle()` though, which
already performs some verifications even though we may ultimately abort
due to a missing repository.
This is problematic because `open_bundle()` already reads the bundle
header and verifies that it contains a properly formatted hash. When
there is no repository we have no clue what hash function to expect
though, so we always end up assuming SHA1 here, which may or may not be
correct. Furthermore, we are about to stop initializing `the_hash_algo`
when there is no repository, which will lead to segfaults.
Check early on whether we have a repository to fix this issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We access `the_hash_algo` in git-blame(1) before we have executed
`parse_options_start()`, which may not be properly set up in case we
have no repository. This is fine for most of the part because all the
call paths that lead to it (git-blame(1), git-annotate(1) as well as
git-pick-axe(1)) specify `RUN_SETUP` and thus require a repository.
There is one exception though, namely when passing `-h` to print the
help. Here we will access `the_hash_algo` even if there is no repo.
This works fine right now because `the_hash_algo` gets sets up to point
to the SHA1 algorithm via `initialize_repository()`. But we're about to
stop doing this, and thus the code would lead to a `NULL` pointer
exception.
Prepare the code for this and only access `the_hash_algo` after we are
sure that there is a proper repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--short=` option for git-rev-parse(1) allows the user to specify
to how many characters object IDs should be shortened to. The option is
broken though for SHA256 repositories because we set the maximum allowed
hash size to `the_hash_algo->hexsz` before we have even set up the repo.
Consequently, `the_hash_algo` will always be SHA1 and thus we truncate
every hash after at most 40 characters.
Fix this by accessing `the_hash_algo` only after we have set up the
repo.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dumb HTTP transport tries to read the remote HEAD reference by
downloading the "HEAD" file and then parsing it via `http_fetch_ref()`.
This function will either parse the file as an object ID in case it is
exactly `the_hash_algo->hexsz` long, or otherwise it will check whether
the reference starts with "ref :" and parse it as a symbolic ref.
This is broken when parsing detached HEADs of a remote SHA256 repository
because we never update `the_hash_algo` to the discovered remote object
hash. Consequently, `the_hash_algo` will always be the fallback SHA1
hash algorithm, which will cause us to fail parsing HEAD altogteher when
it contains a SHA256 object ID.
Fix this issue by setting up `the_hash_algo` via `repo_set_hash_algo()`.
While at it, let's make the expected SHA1 fallback explicit in our code,
which also addresses an upcoming issue where we are going to remove the
SHA1 fallback for `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If either the `--attr-source` option or the `GIT_ATTR_SOURCE` envvar are
set, then `compute_default_attr_source()` will try to look up the value
as a treeish. It is possible to hit that function while outside of a Git
repository though, for example when using `git grep --no-index`. In that
case, Git will hit a bug because we try to look up the main ref store
outside of a repository.
Handle the case gracefully and detect when we try to look up an attr
source without a repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `default_attr_source()` function lazily computes the attr source
supposedly once, only. This is done via a static variable `attr_source`
that contains the resolved object ID of the attr source's tree. If the
variable is the null object ID then we try to look up the attr source,
otherwise we skip over it.
This approach is flawed though: the variable will never be set to
anything else but the null object ID in case there is no attr source.
Consequently, we re-compute the information on every call. And in the
worst case, when we silently ignore bad trees, this will cause us to try
and look up the treeish every single time.
Improve this by introducing a separate variable `has_attr_source` to
track whether we already computed the attr source and, if so, whether we
have an attr source or not.
This also allows us to convert the `ignore_bad_attr_tree` to not be
static anymore as the code will only be executed once anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `OPT__ABBREV()` option can be used to add an option that abbreviates
object IDs. When given a length longer than `the_hash_algo->hexsz`, then
it will instead set the length to that maximum length.
It may not always be guaranteed that we have `the_hash_algo` initialized
properly as the hash algorithm can only be set up after we have set up
`the_repository`. In that case, the hash would always be truncated to
the hex length of SHA1, which may not be what the user desires.
In practice it's not a problem as all commands that use `OPT__ABBREV()`
also have `RUN_SETUP` set and thus cannot work without a repository.
Consequently, both `the_repository` and `the_hash_algo` would be
properly set up.
Regardless of that, harden the code to not truncate the length when we
didn't set up a repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While `validate_headref()` is only called from `is_git_directory()` in
"setup.c", it is currently implemented in "path.c". Move it over such
that it becomes clear that it is only really used during setup in order
to discover repositories.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `validate_headref()` function takes a path to a supposed "HEAD" file
and checks whether its format is something that we understand. It is
used as part of our repository discovery to check whether a specific
directory is a Git directory or not.
Part of the validation is a check for a detached HEAD that contains a
plain object ID. To do this validation we use `get_oid_hex()`, which
relies on `the_hash_algo`. At this point in time the hash algo cannot
yet be initialized though because we didn't yet read the Git config.
Consequently, it will always be the SHA1 hash algorithm.
In practice this works alright because `get_oid_hex()` only ends up
checking whether the prefix of the buffer is a valid object ID. And
because SHA1 is shorter than SHA256, the function will successfully
parse SHA256 object IDs, as well.
It is somewhat fragile though and not really the intent to only check
for SHA1. With this in mind, harden the code to use `get_oid_hex_any()`
to check whether the "HEAD" file parses as any known hash.
One might be hard pressed to tighten the check even further and fully
validate the file contents, not only the prefix. In practice though that
wouldn't make a lot of sense as it could be that the repository uses a
hash function that produces longer hashes than SHA256, but which the
current version of Git doesn't understand yet. We'd still want to detect
the repository as proper Git repository in that case, and we will fail
eventually with a proper error message that the hash isn't understood
when trying to set up the repository format.
It follows that we could just leave the current code intact, as in
practice the code change doesn't have any user visible impact. But it
also prepares us for `the_hash_algo` being unset when there is no
repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/the-index-is-no-more:
repository: drop `initialize_the_repository()`
repository: drop `the_index` variable
builtin/clone: stop using `the_index`
repository: initialize index in `repo_init()`
builtin: stop using `the_index`
t/helper: stop using `the_index`
With 23865355 (attr: read attributes from HEAD when bare repo,
2023-10-13), we started to use the HEAD tree as the default
attribute source in a bare repository. One argument for such a
behaviour is that it would make things like "git archive" run in
bare and non-bare repositories for the same commit consistent.
This changes was merged to Git 2.43 but without an explicit mention
in its release notes.
It turns out that this change destroys performance of shallowly
cloning from a bare repository. As the "server" installations are
expected to be mostly bare, and "git pack-objects", which is the
core of driving the other side of "git clone" and "git fetch" wants
to see if a path is set not to delta with blobs from other paths via
the attribute system, the change forces the server side to traverse
the tree of the HEAD commit needlessly to find if each and every
paths the objects it sends out has the attribute that controls the
deltification. Given that (1) most projects do not configure such
an attribute, and (2) it is dubious for the server side to honor
such an end-user supplied attribute anyway, this was a poor choice
of the default.
To mitigate the current situation, let's revert the change that uses
the tree of HEAD in a bare repository by default as the attribute
source. This will help most people who have been happy with the
behaviour of Git 2.42 and before.
Two things to note:
* If you are stuck with versions of Git 2.43 or newer, that is
older than the release this fix appears in, you can explicitly
set the attr.tree configuration variable to point at an empty
tree object, i.e.
$ git config attr.tree 4b825dc642
* If you like the behaviour we are reverting, you can explicitly
set the attr.tree configuration variable to HEAD, i.e.
$ git config attr.tree HEAD
The right fix for this is to optimize the code paths that allow
accesses to attributes in tree objects, but that is a much more
involved change and is left as a longer-term project, outside the
scope of this "first step" fix.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A scheduled "git maintenance" job is expected to work on all
repositories it knows about, but it stopped at the first one that
errored out. Now it keeps going.
* js/for-each-repo-keep-going:
maintenance: running maintenance should not stop on errors
for-each-repo: optionally keep going on an error
In addition to building the objects needed, try to link the objects
that are used in fuzzer tests, to make sure at least they build
without bitrot, in Linux CI runs.
* js/build-fuzz-more-often:
fuzz: link fuzz programs with `make all` on Linux
Advertise "git contacts", a tool for newcomers to find people to
ask review for their patches, a bit more in our developer
documentation.
* la/doc-use-of-contacts-when-contributing:
SubmittingPatches: demonstrate using git-contacts with git-send-email
SubmittingPatches: add heading for format-patch and send-email
SubmittingPatches: dedupe discussion of security patches
SubmittingPatches: discuss reviewers first
SubmittingPatches: quote commands
SubmittingPatches: mention GitGitGadget
SubmittingPatches: clarify 'git-contacts' location
MyFirstContribution: mention contrib/contacts/git-contacts
The "--rfc" option of "git format-patch" learned to take an
optional string value to be used in place of "RFC" to tweak the
"[PATCH]" on the subject header.
* jc/format-patch-rfc-more:
format-patch: "--rfc=-(WIP)" appends to produce [PATCH (WIP)]
format-patch: allow --rfc to optionally take a value, like --rfc=WIP
The "-k" and "--rfc" options of "format-patch" will now error out
when used together, as one tells us not to add anything to the
title of the commit, and the other one tells us to add "RFC" in
addition to "PATCH".
* ds/format-patch-rfc-and-k:
format-patch: ensure that --rfc and -k are mutually exclusive
The procedure to build multi-pack-index got confused by the
replace-refs mechanism, which has been corrected by disabling the
latter.
* xx/disable-replace-when-building-midx:
midx: disable replace objects
"git rebase --signoff" used to forget that it needs to add a
sign-off to the resulting commit when told to continue after a
conflict stops its operation.
* pw/rebase-m-signoff-fix:
rebase -m: fix --signoff with conflicts
sequencer: store commit message in private context
sequencer: move current fixups to private context
sequencer: start removing private fields from public API
sequencer: always free "struct replay_opts"
Also fix some inconsistencies, and fix issue reported by
Anders Jonsson <anders.jonsson@norsjovallen.se>.
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Leakfix.
* rj/add-i-leak-fix:
add: plug a leak on interactive_add
add-patch: plug a leak handling the '/' command
add-interactive: plug a leak in get_untracked_files
apply: plug a leak in apply_data
Demote a BUG() to an die() when the failure from vsnprintf() may
not be due to a programmer error.
* rs/vsnprintf-failure-is-not-a-bug:
don't report vsnprintf(3) error as bug
Since 5e47215080 (fuzz: add basic fuzz testing target., 2018-10-12), we
have compiled object files for the fuzz tests as part of the default
'make all' target. This helps prevent bit-rot in lesser-used parts of
the codebase, by making sure that incompatible changes are caught at
build time.
However, since we never linked the fuzzer executables, this did not
protect us from link-time errors. As of 8b9a42bf48 (fuzz: fix fuzz test
build rules, 2024-01-19), it's now possible to link the fuzzer
executables without using a fuzzing engine and a variety of
compiler-specific (and compiler-version-specific) flags, at least on
Linux. So let's add a platform-specific option in config.mak.uname to
link the executables as part of the default `make all` target.
Since linking the fuzzer executables without a fuzzing engine does not
require a C++ compiler, we can change the FUZZ_PROGRAMS build rule to
use $(CC) by default. This avoids compiler mis-match issues when
overriding $(CC) but not $(CXX). When we *do* want to actually link with
a fuzzing engine, we can set $(FUZZ_CXX). The build instructions in the
CI fuzz-smoke-test job and in the Makefile comment have been updated
accordingly.
While we're at it, we can consolidate some of the fuzzer build
instructions into one location in the Makefile.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In https://github.com/microsoft/git/issues/623, it was reported that
maintenance stops on a missing repository, omitting the remaining
repositories that were scheduled for maintenance.
This is undesirable, as it should be a best effort type of operation.
It should still fail due to the missing repository, of course, but not
leave the non-missing repositories in unmaintained shapes.
Let's use `for-each-repo`'s shiny new `--keep-going` option that we just
introduced for that very purpose.
This change will be picked up when running `git maintenance start`,
which is run implicitly by `scalar reconfigure`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In https://github.com/microsoft/git/issues/623, it was reported that
the regularly scheduled maintenance stops if one repo in the middle of
the list was found to be missing.
This is undesirable, and points out a gap in the design of `git
for-each-repo`: We need a mode where that command does not stop on an
error, but continues to try running the specified command with the other
repositories.
Imitating the `--keep-going` option of GNU make, this commit teaches
`for-each-repo` the same trick: to continue with the operation on all
the remaining repositories in case there was a problem with one
repository, still setting the exit code to indicate an error occurred.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "receive-pack" program (which responds to "git push") was not
converted to run "git maintenance --auto" when other codepaths that
used to run "git gc --auto" were updated, which has been corrected.
* ps/run-auto-maintenance-in-receive-pack:
builtin/receive-pack: convert to use git-maintenance(1)
run-command: introduce function to prepare auto-maintenance process
When "git bisect" reports the commit it determined to be the
culprit, we used to show it in a format that does not honor common
UI tweaks, like log.date and log.decorate. The code has been
taught to use "git show" to follow more customizations.
* pk/bisect-use-show:
bisect: report the found commit with "show"