development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-30 14:03:28 +00:00

Author	SHA1	Message	Date
Patrick Steinhardt	390c5b07e2	t7300: assert exact states of repo Some of the tests in t7300 verify that git-clean(1) doesn't touch repositories that are embedded into the main repository. This is done by asserting a small set of substructures that are assumed to always exist, like the "refs/", "objects/" or "HEAD". This has the downside that we need to assume a specific repository structure that may be subject to change when new backends for the refdb land. At the same time, we don't thoroughly assert that git-clean(1) really didn't end up cleaning any files in the repository either. Convert the tests to instead assert that all files continue to exist after git-clean(1) by comparing a file listing via find(1) before and after executing clean. This makes our actual assertions stricter while having to care less about the repository's actual on-disk format. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-03 08:37:07 +09:00
Patrick Steinhardt	c6429fb867	t4207: delete replace references via git-update-ref(1) In t4207 we set up a set of replace objects via git-replace(1). Because these references should not be impacting subsequent tests we also set up some cleanup logic that deletes the replacement references via a call to `rm -rf`. This reaches into the internal implementation details of the reference backend and will thus break when we grow an alternative refdb implementation. Refactor the tests to delete the replacement refs via Git commands so that we become independent of the actual refdb that's in use. As we don't have a nice way to delete all replacements or all references in a certain namespace, we opt for a combination of git-for-each-ref(1) and git-update-ref(1)'s `--stdin` mode. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-03 08:37:07 +09:00
Patrick Steinhardt	c603138e3d	t1450: convert tests to remove worktrees via git-worktree(1) Some of our tests in t1450 create worktrees and then corrupt them. As it is impossible to delete such worktrees via a normal call to `git worktree remove`, we instead opt to remove them manually by calling rm(1) instead. This is ultimately unnecessary though as we can use the `-f` switch to remove the worktree. Let's convert the tests to do so such that we don't have to reach into internal implementation details of worktrees. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-03 08:37:06 +09:00
Patrick Steinhardt	668e31c690	t: convert tests to not access reflog via the filesystem Some of our tests reach directly into the filesystem in order to both read or modify the reflog, which will break once we have a second reference backend in our codebase that stores reflogs differently. Refactor these tests to either use git-reflog(1) or the ref-store test helper. Note that the refactoring to use git-reflog(1) also requires us to adapt our expectations in some cases where we previously verified the exact on-disk log entries. This seems like an acceptable tradeoff though to ensure that different backends have the same user-visible behaviour as any user would typically use git-reflog(1) anyway to access the logs. Any backend-specific verification of the written on-disk format should be implemented in a separate, backend-specific test. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-03 08:37:06 +09:00
Patrick Steinhardt	2393711681	t: convert tests to not access symrefs via the filesystem Some of our tests access symbolic references via the filesystem directly. While this works with the current files reference backend, it this will break once we have a second reference backend in our codebase. Refactor these tests to instead use git-symbolic-ref(1) or our `ref-store` test tool. The latter is required in some cases where safety checks of git-symbolic-ref(1) would otherwise reject writing a symbolic reference. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-03 08:37:06 +09:00
Patrick Steinhardt	1c6667cb9d	t: convert tests to not write references via the filesystem Some of our tests manually create, update or delete references by writing the respective new values into the filesystem directly. While this works with the current files reference backend, this will break once we have a second reference backend implementation in our codebase. Refactor these tests to instead use git-update-ref(1) or our `ref-store` test tool. The latter is required in some cases where safety checks of git-update-ref(1) would otherwise reject a reference update. While at it, refactor some of the tests to schedule the cleanup command via `test_when_finished` before modifying the repository. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-03 08:37:06 +09:00
Patrick Steinhardt	9ddd5b883b	t: allow skipping expected object ID in `ref-store update-ref` We require the caller to pass both the old and new expected object ID to our `test-tool ref-store update-ref` helper. When trying to update a symbolic reference though it's impossible to specify the expected object ID, which means that the test would instead have to force-update the reference. This is currently impossible though. Update the helper to optionally skip verification of the old object ID in case the test passes in an empty old object ID as input. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-03 08:37:06 +09:00
Junio C Hamano	d5dbad0c76	Merge branch 'ps/show-ref' into ps/ref-tests-update * ps/show-ref: t: use git-show-ref(1) to check for ref existence builtin/show-ref: add new mode to check for reference existence builtin/show-ref: explicitly spell out different modes in synopsis builtin/show-ref: ensure mutual exclusiveness of subcommands builtin/show-ref: refactor options for patterns subcommand builtin/show-ref: stop using global vars for `show_one()` builtin/show-ref: stop using global variable to count matches builtin/show-ref: refactor `--exclude-existing` options builtin/show-ref: fix dead code when passing patterns builtin/show-ref: fix leaking string buffer builtin/show-ref: split up different subcommands builtin/show-ref: convert pattern to a local variable	2023-11-03 08:36:54 +09:00
Junio C Hamano	6789275d37	tests: teach callers of test_i18ngrep to use test_grep They are equivalents and the former still exists, so as long as the only change this commit makes are to rewrite test_i18ngrep to test_grep, there won't be any new bug, even if there still are callers of test_i18ngrep remaining in the tree, or when merged to other topics that add new uses of test_i18ngrep. This patch was produced more or less with git grep -l -e 'test_i18ngrep ' 't/t[0-9][0-9][0-9][0-9]-.sh' \| xargs perl -p -i -e 's/test_i18ngrep /test_grep /' and a good way to sanity check the result yourself is to run the above in a checkout of c4603c1c (test framework: further deprecate test_i18ngrep, 2023-10-31) and compare the resulting working tree contents with the result of applying this patch to the same commit. You'll see that test_i18ngrep in a few t/lib-.sh files corrected, in addition to the manual reproduction. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-02 17:13:44 +09:00
Junio C Hamano	f92cea12c9	Merge branch 'jk/decoration-and-other-leak-fixes' into maint-2.42 Leakfix. * jk/decoration-and-other-leak-fixes: daemon: free listen_addr before returning revision: clear decoration structs during release_revisions() decorate: add clear_decoration() function	2023-11-02 16:53:26 +09:00
Junio C Hamano	71c614b9a2	Merge branch 'ob/t3404-typofix' into maint-2.42 Code clean-up. * ob/t3404-typofix: t3404-rebase-interactive.sh: fix typos in title of a rewording test	2023-11-02 16:53:24 +09:00
Junio C Hamano	535b30eb58	Merge branch 'bc/more-git-var' into maint-2.42 Fix-up for a topic that already has graduated. * bc/more-git-var: var: avoid a segmentation fault when `HOME` is unset	2023-11-02 16:53:23 +09:00
Junio C Hamano	43af21409e	Merge branch 'ch/t6300-verify-commit-test-cleanup' into maint-2.42 Test clean-up. * ch/t6300-verify-commit-test-cleanup: t/t6300: drop magic filtering t/lib-gpg: forcibly run a trustdb update	2023-11-02 16:53:22 +09:00
Junio C Hamano	8db7d2d6bd	Merge branch 'ob/t9001-indent-fix' into maint-2.42 Test style fix. * ob/t9001-indent-fix: t9001: fix indentation in test_no_confirm()	2023-11-02 16:53:21 +09:00
Junio C Hamano	07011e1480	Merge branch 'jk/test-pass-ubsan-options-to-http-test' into maint-2.42 UBSAN options were not propagated through the test framework to git run via the httpd, unlike ASAN options, which has been corrected. * jk/test-pass-ubsan-options-to-http-test: test-lib: set UBSAN_OPTIONS to match ASan	2023-11-02 16:53:20 +09:00
Junio C Hamano	2fd4378d64	Merge branch 'js/diff-cached-fsmonitor-fix' into maint-2.42 "git diff --cached" codepath did not fill the necessary stat information for a file when fsmonitor knows it is clean and ended up behaving as if it is not clean, which has been corrected. * js/diff-cached-fsmonitor-fix: diff-lib: fix check_removed when fsmonitor is on	2023-11-02 16:53:19 +09:00
Junio C Hamano	bfb8376d68	Merge branch 'pw/diff-no-index-from-named-pipes' into maint-2.42 "git diff --no-index -R <(one) <(two)" did not work correctly, which has been corrected. * pw/diff-no-index-from-named-pipes: diff --no-index: fix -R with stdin	2023-11-02 16:53:18 +09:00
Junio C Hamano	d12df942ba	Merge branch 'js/complete-checkout-t' into maint-2.42 The completion script (in contrib/) has been taught to treat the "-t" option to "git checkout" and "git switch" just like the "--track" option, to complete remote-tracking branches. * js/complete-checkout-t: completion(switch/checkout): treat --track and -t the same	2023-11-02 16:53:18 +09:00
Junio C Hamano	a70f725c06	Merge branch 'pw/rebase-i-after-failure' into maint-2.42 Various fixes to the behaviour of "rebase -i" when the command got interrupted by conflicting changes. cf. <6b927687-cf6e-d73e-78fb-bd4f46736928@gmx.de> * pw/rebase-i-after-failure: rebase -i: fix adding failed command to the todo list rebase --continue: refuse to commit after failed command rebase: fix rewritten list for failed pick sequencer: factor out part of pick_commits() sequencer: use rebase_path_message() rebase -i: remove patch file after conflict resolution rebase -i: move unlink() calls	2023-11-02 16:53:17 +09:00
Junio C Hamano	d97034b0a1	Merge branch 'ks/ref-filter-sort-numerically' into maint-2.42 "git for-each-ref --sort='contents:size'" sorts the refs according to size numerically, giving a ref that points at a blob twelve-byte (12) long before showing a blob hundred-byte (100) long. * ks/ref-filter-sort-numerically: ref-filter: sort numerically when ":size" is used	2023-11-02 16:53:17 +09:00
Junio C Hamano	57b52cec46	Merge branch 'jk/diff-result-code-cleanup' into maint-2.42 "git diff --no-such-option" and other corner cases around the exit status of the "diff" command has been corrected. * jk/diff-result-code-cleanup: diff: drop useless "status" parameter from diff_result_code() diff: drop useless return values in git-diff helpers diff: drop useless return from run_diff_{files,index} functions diff: die when failing to read index in git-diff builtin diff: show usage for unknown builtin_diff_files() options diff-files: avoid negative exit value diff: spell DIFF_INDEX_CACHED out when calling run_diff_index()	2023-11-02 16:53:16 +09:00
Junio C Hamano	7f8314f277	Merge branch 'ts/unpacklimit-config-fix' into maint-2.42 transfer.unpackLimit ought to be used as a fallback, but overrode fetch.unpackLimit and receive.unpackLimit instead. * ts/unpacklimit-config-fix: transfer.unpackLimit: fetch/receive.unpackLimit takes precedence	2023-11-02 16:53:16 +09:00
Junio C Hamano	8764491463	Merge branch 'jc/diff-exit-code-with-w-fixes' into maint-2.42 "git diff -w --exit-code" with various options did not work correctly, which is being addressed. * jc/diff-exit-code-with-w-fixes: diff: the -w option breaks --exit-code for --raw and other output modes t4040: remove test that succeeded for a wrong reason diff: teach "--stat -w --exit-code" to notice differences diff: mode-only change should be noticed by "--patch -w --exit-code" diff: move the fallback "--exit-code" code down	2023-11-02 16:53:15 +09:00
Junio C Hamano	1ea39ad467	Merge branch 'tb/commit-graph-verify-fix' into maint-2.42 The commit-graph verification code that detects mixture of zero and non-zero generation numbers has been updated. * tb/commit-graph-verify-fix: commit-graph: avoid repeated mixed generation number warnings t/t5318-commit-graph.sh: test generation zero transitions during fsck commit-graph: verify swapped zero/non-zero generation cases commit-graph: introduce `commit_graph_generation_from_graph()`	2023-11-02 16:53:15 +09:00
Junio C Hamano	50758312f2	Merge branch 'ds/scalar-updates' into maint-2.42 Scalar updates. * ds/scalar-updates: scalar reconfigure: help users remove buggy repos setup: add discover_git_directory_reason() scalar: add --[no-]src option	2023-11-02 16:53:15 +09:00
Junio C Hamano	396a167bd4	Merge branch 'mp/rebase-label-length-limit' into maint-2.42 Overly long label names used in the sequencer machinery are now chopped to fit under filesystem limitation. * mp/rebase-label-length-limit: rebase: allow overriding the maximal length of the generated labels sequencer: truncate labels to accommodate loose refs	2023-11-02 16:53:14 +09:00
Junio C Hamano	6d68ab0819	Merge branch 'jk/test-lsan-denoise-output' into maint-2.42 Tests with LSan from time to time seem to emit harmless message that makes our tests unnecessarily flakey; we work it around by filtering the uninteresting output. * jk/test-lsan-denoise-output: test-lib: ignore uninteresting LSan output	2023-11-02 16:53:14 +09:00
brian m. carlson	e1068f0ad4	merge-file: add an option to process object IDs git merge-file knows how to merge files on the file system already. It would be helpful, however, to allow it to also merge single blobs. Teach it an `--object-id` option which means that its arguments are object IDs and not files to allow it to do so. We handle the empty blob specially since read_mmblob doesn't read it directly and otherwise users cannot specify an empty ancestor. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-02 08:51:40 +09:00
Patrick Steinhardt	0497e6c611	t: use git-show-ref(1) to check for ref existence Convert tests that use `test_path_is_file` and `test_path_is_missing` to instead use a set of helpers `test_ref_exists` and `test_ref_missing`. These helpers are implemented via the newly introduced `git show-ref --exists` command. Thus, we can avoid intimate knowledge of how the ref backend stores references on disk. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-01 12:09:01 +09:00
Patrick Steinhardt	9080a7f178	builtin/show-ref: add new mode to check for reference existence While we have multiple ways to show the value of a given reference, we do not have any way to check whether a reference exists at all. While commands like git-rev-parse(1) or git-show-ref(1) can be used to check for reference existence in case the reference resolves to something sane, neither of them can be used to check for existence in some other scenarios where the reference does not resolve cleanly: - References which have an invalid name cannot be resolved. - References to nonexistent objects cannot be resolved. - Dangling symrefs can be resolved via git-symbolic-ref(1), but this requires the caller to special case existence checks depending on whether or not a reference is symbolic or direct. Furthermore, git-rev-list(1) and other commands do not let the caller distinguish easily between an actually missing reference and a generic error. Taken together, this seems like sufficient motivation to introduce a separate plumbing command to explicitly check for the existence of a reference without trying to resolve its contents. This new command comes in the form of `git show-ref --exists`. This new mode will exit successfully when the reference exists, with a specific exit code of 2 when it does not exist, or with 1 when there has been a generic error. Note that the only way to properly implement this command is by using the internal `refs_read_raw_ref()` function. While the public function `refs_resolve_ref_unsafe()` can be made to behave in the same way by passing various flags, it does not provide any way to obtain the errno with which the reference backend failed when reading the reference. As such, it becomes impossible for us to distinguish generic errors from the explicit case where the reference wasn't found. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-01 12:09:01 +09:00
Patrick Steinhardt	199970e72f	builtin/show-ref: ensure mutual exclusiveness of subcommands The git-show-ref(1) command has three different modes, of which one is implicit and the other two can be chosen explicitly by passing a flag. But while these modes are standalone and cause us to execute completely separate code paths, we gladly accept the case where a user asks for both `--exclude-existing` and `--verify` at the same time even though it is not obvious what will happen. Spoiler: we ignore `--verify` and execute the `--exclude-existing` mode. Let's explicitly detect this invalid usage and die in case both modes were requested. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-01 12:09:00 +09:00
Karthik Nayak	9830926c7d	rev-list: add commit object support in `--missing` option The `--missing` object option in rev-list currently works only with missing blobs/trees. For missing commits the revision walker fails with a fatal error. Let's extend the functionality of `--missing` option to also support commit objects. This is done by adding a `missing_objects` field to `rev_info`. This field is an `oidset` to which we'll add the missing commits as we encounter them. The revision walker will now continue the traversal and call `show_commit()` even for missing commits. In rev-list we can then check if the commit is a missing commit and call the existing code for parsing `--missing` objects. A scenario where this option would be used is to find the boundary objects between different object directories. Consider a repository with a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a repository, using the `--missing=print` option while disabling the alternate object directory allows us to find the boundary objects between the main and alternate object directory. Helped-by: Patrick Steinhardt <ps@pks.im> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-01 12:07:18 +09:00
Junio C Hamano	922cc26e41	Merge branch 'ps/do-not-trust-commit-graph-blindly-for-existence' into kn/rev-list-missing-fix * ps/do-not-trust-commit-graph-blindly-for-existence: commit: detect commits that exist in commit-graph but not in the ODB commit-graph: introduce envvar to disable commit existence checks	2023-11-01 12:06:55 +09:00
Patrick Steinhardt	7a5d604443	commit: detect commits that exist in commit-graph but not in the ODB Commit graphs can become stale and contain references to commits that do not exist in the object database anymore. Theoretically, this can lead to a scenario where we are able to successfully look up any such commit via the commit graph even though such a lookup would fail if done via the object database directly. As the commit graph is mostly intended as a sort of cache to speed up parsing of commits we do not want to have diverging behaviour in a repository with and a repository without commit graphs, no matter whether they are stale or not. As commits are otherwise immutable, the only thing that we really need to care about is thus the presence or absence of a commit. To address potentially stale commit data that may exist in the graph, our `lookup_commit_in_graph()` function will check for the commit's existence in both the commit graph, but also in the object database. So even if we were able to look up the commit's data in the graph, we would still pretend as if the commit didn't exist if it is missing in the object database. We don't have the same safety net in `parse_commit_in_graph_one()` though. This function is mostly used internally in "commit-graph.c" itself to validate the commit graph, and this usage is fine. We do expose its functionality via `parse_commit_in_graph()` though, which gets called by `repo_parse_commit_internal()`, and that function is in turn used in many places in our codebase. For all I can see this function is never used to directly turn an object ID into a commit object without additional safety checks before or after this lookup. What it is being used for though is to walk history via the parent chain of commits. So when commits in the parent chain of a graph walk are missing it is possible that we wouldn't notice if that missing commit was part of the commit graph. Thus, a query like `git rev-parse HEAD~2` can succeed even if the intermittent commit is missing. It's unclear whether there are additional ways in which such stale commit graphs can lead to problems. In any case, it feels like this is a bigger bug waiting to happen when we gain additional direct or indirect callers of `repo_parse_commit_internal()`. So let's fix the inconsistent behaviour by checking for object existence via the object database, as well. This check of course comes with a performance penalty. The following benchmarks have been executed in a clone of linux.git with stable tags added: Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] Range (min … max): 2.894 s … 2.950 s 10 runs Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] Range (min … max): 3.780 s … 3.961 s 10 runs Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] Range (min … max): 13.714 s … 13.995 s 10 runs Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] Range (min … max): 13.645 s … 14.038 s 10 runs Summary git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) We look at a ~30% regression in general, but in general we're still a whole lot faster than without the commit graph. To counteract this, the new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-01 12:04:06 +09:00
Patrick Steinhardt	e04838ea82	commit-graph: introduce envvar to disable commit existence checks Our `lookup_commit_in_graph()` helper tries to look up commits from the commit graph and, if it doesn't exist there, falls back to parsing it from the object database instead. This is intended to speed up the lookup of any such commit that exists in the database. There is an edge case though where the commit exists in the graph, but not in the object database. To avoid returning such stale commits the helper function thus double checks that any such commit parsed from the graph also exists in the object database. This makes the function safe to use even when commit graphs aren't updated regularly. We're about to introduce the same pattern into other parts of our code base though, namely `repo_parse_commit_internal()`. Here the extra sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was a newly introduced helper, and as such there was no performance hit by adding this sanity check. If we added `repo_parse_commit_internal()` with that sanity check right from the beginning as well, this would probably never have been an issue to begin with. But by retrofitting it with this sanity check now we do add a performance regression to preexisting code, and thus there is a desire to avoid this or at least give an escape hatch. In practice, there is no inherent reason why either of those functions should have the sanity check whereas the other one does not: either both of them are able to detect this issue or none of them should be. This also means that the default of whether we do the check should likely be the same for both. To err on the side of caution, we thus rather want to make `repo_parse_commit_internal()` stricter than to loosen the checks that we already have in `lookup_commit_in_graph()`. The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is the default, we will double check that commits looked up in the commit graph via `lookup_commit_in_graph()` also exist in the object database. This same check will also be added in `repo_parse_commit_internal()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-01 12:04:06 +09:00
Junio C Hamano	2e87fca189	test framework: further deprecate test_i18ngrep As an attempt to come up with a useful mechanism to ensure that certain messages are left untranslated [], we earlier wrote GIT_TEST_GETTEXT_POISON off as a failed experiment. But the output from the test helper was easier to use while debugging failed tests, compared to the same test writtein with the plain-vanilla "grep". Especially when a test that expects a certain string to appear in the output (e.g. "this test must fail with this message") fails, "grep message output" would just silently fail and in a &&-chained sequence of commands, it is hard to tell which step failed. test_i18ngrep explicitly said "we wanted to see a line that match this pattern but did not see a hit in this file". What we have as test_i18ngrep in our tree still retains this verbose output (even though we got rid of the "poison" support). Let's rename it to test_grep (because it is no longer about i18n at all) and then make test_i18ngrep a thin wrapper around it. Existing callers of test_i18ngrep can be mechanically rewritten to instead use test_grep over time, but it does not have to be done in this commit. [Footnote] The idea was that human-facing messages are often translated, but there are messages that should never be translated. We use "grep" only for the latter kind of messages, and then run tests in "poison" mode that spew garbage for translatable messages. If such a test run fails, it means these messages tested with "grep" were marked for translation by mistake. test_i18ngrep was to be used for other messages that are to be translated, and was to always "succeed" when runing under the "poison" mode. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-31 14:24:35 +09:00
Junio C Hamano	ece54894fe	Merge branch 'ii/branch-error-messages-update' Error message update. * ii/branch-error-messages-update: builtin/branch.c: adjust error messages to coding guidelines	2023-10-31 12:57:44 +09:00
Robert Coup	b8f58c200c	upload-pack: add tracing for fetches Information on how users are accessing hosted repositories can be helpful to server operators. For example, being able to broadly differentiate between fetches and initial clones; the use of shallow repository features; or partial clone filters. `a29263c` (fetch-pack: add tracing for negotiation rounds, 2022-08-02) added some information on have counts to fetch-pack itself to help diagnose negotiation; but from a git-upload-pack (server) perspective, there's no means of accessing such information without using GIT_TRACE_PACKET to examine the protocol packets. Improve this by emitting a Trace2 JSON event from upload-pack with summary information on the contents of a fetch request. * haves, wants, and want-ref counts can help determine (broadly) between fetches and clones, and the use of single-branch, etc. * shallow clone depth, tip counts, and deepening options. * any partial clone filter type. Signed-off-by: Robert Coup <robert@coup.net.nz> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-30 21:43:21 +09:00
Junio C Hamano	5006bfc1f5	Merge branch 'jk/send-email-fix-addresses-from-composed-messages' The codepath to handle recipient addresses `git send-email --compose` learns from the user was completely broken, which has been corrected. * jk/send-email-fix-addresses-from-composed-messages: send-email: handle to/cc/bcc from --compose message Revert "send-email: extract email-parsing code into a subroutine" doc/send-email: mention handling of "reply-to" with --compose	2023-10-30 07:09:59 +09:00
Junio C Hamano	64912cc023	Merge branch 'kh/pathspec-error-wo-repository-fix' The pathspec code carelessly dereferenced NULL while emitting an error message, which has been corrected. * kh/pathspec-error-wo-repository-fix: grep: die gracefully when outside repository	2023-10-30 07:09:57 +09:00
Junio C Hamano	3a5e77e346	Merge branch 'da/t7601-style-fix' Coding style update. * da/t7601-style-fix: t7601: use "test_path_is_file" etc. instead of "test -f"	2023-10-30 07:09:56 +09:00
Junio C Hamano	26dd307cfa	Merge branch 'jc/attr-tree-config' The attribute subsystem learned to honor `attr.tree` configuration that specifies which tree to read the .gitattributes files from. * jc/attr-tree-config: attr: add attr.tree for setting the treeish to read attributes from attr: read attributes from HEAD when bare repo	2023-10-30 07:09:55 +09:00
Junio C Hamano	8183b63ff6	Merge branch 'sn/typo-grammo-phraso-fixes' Many typos, ungrammatical sentences and wrong phrasing have been fixed. * sn/typo-grammo-phraso-fixes: t/README: fix multi-prerequisite example doc/gitk: s/sticked/stuck/ git-jump: admit to passing merge mode args to ls-files doc/diff-options: improve wording of the log.diffMerges mention doc: fix some typos, grammar and wording issues	2023-10-30 07:09:55 +09:00
René Scharfe	26d4c51d36	reflog: fix expire --single-worktree `33d7bdd645` (builtin/reflog.c: use parse-options api for expire, delete subcommands, 2022-01-06) broke the option --single-worktree of git reflog expire and added a non-printable short flag for it, presumably by accident. While before it set the variable "all_worktrees" to 0, now it sets it to 1, its default value. --no-single-worktree is required now to set it to 0. Fix it by replacing the variable with one that has the opposite meaning, to avoid the negation and its potential for confusion. The new variable "single_worktree" directly captures whether --single-worktree was given. Also remove the unprintable short flag SOH (start of heading) because it is undocumented, hard to use and is likely to have been added by mistake in connection with the negation bug above. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-29 12:19:28 +09:00
René Scharfe	0025dde775	parse-options: make CMDMODE errors more precise Only a single PARSE_OPT_CMDMODE option can be specified for the same variable at the same time. This is enforced by get_value(), but the error messages are imprecise in three ways: 1. If a non-PARSE_OPT_CMDMODE option changes the value variable of a PARSE_OPT_CMDMODE option then an ominously vague message is shown: $ t/helper/test-tool parse-options --set23 --mode1 error: option `mode1' : incompatible with something else Worse: If the order of options is reversed then no error is reported at all: $ t/helper/test-tool parse-options --mode1 --set23 boolean: 0 integer: 23 magnitude: 0 timestamp: 0 string: (not set) abbrev: 7 verbose: -1 quiet: 0 dry run: no file: (not set) Fortunately this can currently only happen in the test helper; actual Git commands don't share the same variable for the value of options with and without the flag PARSE_OPT_CMDMODE. 2. If there are multiple options with the same value (synonyms), then the one that is defined first is shown rather than the one actually given on the command line, which is confusing: $ git am --resolved --quit error: option `quit' is incompatible with --continue 3. Arguments of PARSE_OPT_CMDMODE options are not handled by the parse-option machinery. This is left to the callback function. We currently only have a single affected option, --show-current-patch of git am. Errors for it can show an argument that was not actually given on the command line: $ git am --show-current-patch --show-current-patch=diff error: options '--show-current-patch=diff' and '--show-current-patch=raw' cannot be used together The options --show-current-patch and --show-current-patch=raw are synonyms, but the error accuses the user of input they did not actually made. Or it can awkwardly print a NULL pointer: $ git am --show-current-patch=diff --show-current-patch error: options '--show-current-patch=(null)' and '--show-current-patch=diff' cannot be used together The reasons for these shortcomings is that the current code checks incompatibility only when encountering a PARSE_OPT_CMDMODE option at the command line, and that it searches the previous incompatible option by value. Fix the first two points by checking all PARSE_OPT_CMDMODE variables after parsing each option and by storing all relevant details if their value changed. Do that whether or not the changing options has the flag PARSE_OPT_CMDMODE set. Report an incompatibility only if two options change the variable to different values and at least one of them is a PARSE_OPT_CMDMODE option. This changes the output of the first three examples above to: $ t/helper/test-tool parse-options --set23 --mode1 error: --mode1 is incompatible with --set23 $ t/helper/test-tool parse-options --mode1 --set23 error: --set23 is incompatible with --mode1 $ git am --resolved --quit error: --quit is incompatible with --resolved Store the argument of PARSE_OPT_CMDMODE options of type OPTION_CALLBACK as well to allow taking over the responsibility for compatibility checking from the callback function. The next patch will use this capability to fix the messages for git am --show-current-patch. Use a linked list for storing the PARSE_OPT_CMDMODE variables. This somewhat outdated data structure is simple and suffices, as the number of elements per command is currently only zero or one. We do support multiple different command modes variables per command, but I don't expect that we'd ever use a significant number of them. Once we do we can switch to a hashmap. Since we no longer need to search the conflicting option, the all_opts parameter of get_value() is no longer used. Remove it. Extend the tests to check for both conflicting option names, but don't insist on a particular order. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-29 09:15:18 +09:00
Emily Shaffer	681c0a247b	bugreport: reject positional arguments git-bugreport already rejected unrecognized flag arguments, like `--diaggnose`, but this doesn't help if the user's mistake was to forget the `--` in front of the argument. This can result in a user's intended argument not being parsed with no indication to the user that something went wrong. Since git-bugreport presently doesn't take any positionals at all, let's reject all positionals and give the user a usage hint. Signed-off-by: Emily Shaffer <nasamuffin@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-29 08:56:17 +09:00
Emily Shaffer	831401bb14	t0091-bugreport: stop using i18ngrep Since `e6545201ad` (Merge branch 'ab/detox-config-gettext', 2021-04-13), test_i18ngrep is no longer required. Quit using it in the bugreport tests, since it's setting a bad example for tests added later. Signed-off-by: Emily Shaffer <nasamuffin@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-29 08:55:48 +09:00
Michael Strawbridge	0d8647034e	send-email: move validation code below process_address_list Move validation logic below processing of email address lists so that email validation gets the proper email addresses. As a side effect, some initialization needed to be moved down. In order for validation and the actual email sending to have the same initial state, the initialized variables that get modified by pre_process_file are encapsulated in a new function. This fixes email address validation errors when the optional perl module Email::Valid is installed and multiple addresses are passed in on a single to/cc argument like --to=foo@example.com,bar@example.com. A new test was added to t9001 to expose failures with this case in the future. Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Michael Strawbridge <michael.strawbridge@amd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-26 21:46:10 +09:00
Junio C Hamano	5edbcead42	Merge branch 'bc/racy-4gb-files' The index file has room only for lower 32-bit of the file size in the cached stat information, which means cached stat information will have 0 in its sd_size member for a file whose size is multiple of 4GiB. This is mistaken for a racily clean path. Avoid it by storing a bogus sd_size value instead for such files. * bc/racy-4gb-files: Prevent git from rehashing 4GiB files t: add a test helper to truncate files	2023-10-23 13:56:37 -07:00
Junio C Hamano	626f689f79	Merge branch 'jc/fail-stash-to-store-non-stash' Feeding "git stash store" with a random commit that was not created by "git stash create" now errors out. * jc/fail-stash-to-store-non-stash: stash: be careful what we store	2023-10-23 13:56:37 -07:00
Junio C Hamano	755fb09163	Merge branch 'so/diff-merges-dd' "git log" and friends learned "--dd" that is a short-hand for "--diff-merges=first-parent -p". * so/diff-merges-dd: completion: complete '--dd' diff-merges: introduce '--dd' option diff-merges: improve --diff-merges documentation	2023-10-23 13:56:37 -07:00
Junio C Hamano	f32af12cee	Merge branch 'jk/chunk-bounds' The codepaths that read "chunk" formatted files have been corrected to pay attention to the chunk size and notice broken files. * jk/chunk-bounds: (21 commits) t5319: make corrupted large-offset test more robust chunk-format: drop pair_chunk_unsafe() commit-graph: detect out-of-order BIDX offsets commit-graph: check bounds when accessing BIDX chunk commit-graph: check bounds when accessing BDAT chunk commit-graph: bounds-check generation overflow chunk commit-graph: check size of generations chunk commit-graph: bounds-check base graphs chunk commit-graph: detect out-of-bounds extra-edges pointers commit-graph: check size of commit data chunk midx: check size of revindex chunk midx: bounds-check large offset chunk midx: check size of object offset chunk midx: enforce chunk alignment on reading midx: check size of pack names chunk commit-graph: check consistency of fanout table midx: check size of oid lookup chunk commit-graph: check size of oid fanout chunk midx: stop ignoring malformed oid fanout chunk t: add library for munging chunk-format files ...	2023-10-23 13:56:36 -07:00
Isoken June Ibizugbe	12b99928c8	builtin/branch.c: adjust error messages to coding guidelines As per the CodingGuidelines document, it is recommended that error messages such as die(), error() and warning(), should start with a lowercase letter and should not end with a period. This patch adjusts tests to match updated messages. Signed-off-by: Isoken June Ibizugbe <isokenjune@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-23 12:22:57 -07:00
Junio C Hamano	92741d83c0	Merge branch 'ak/pretty-decorate-more-fix' Unlike "git log --pretty=%D", "git log --pretty="%(decorate)" did not auto-initialize the decoration subsystem, which has been corrected. * ak/pretty-decorate-more-fix: pretty: fix ref filtering for %(decorate) formats	2023-10-20 16:23:11 -07:00
Junio C Hamano	6b1e2254d6	Merge branch 'vd/loose-ref-iteration-optimization' The code to iterate over loose references have been optimized to reduce the number of lstat() system calls. * vd/loose-ref-iteration-optimization: files-backend.c: avoid stat in 'loose_fill_ref_dir' dir.[ch]: add 'follow_symlink' arg to 'get_dtype' dir.[ch]: expose 'get_dtype' ref-cache.c: fix prefix matching in ref iteration	2023-10-20 16:23:11 -07:00
Junio C Hamano	c662038629	Merge branch 'ty/merge-tree-strategy-options' "git merge-tree" learned to take strategy backend specific options via the "-X" option, like "git merge" does. * ty/merge-tree-strategy-options: merge: introduce {copy\|clear}_merge_options() merge-tree: add -X strategy option	2023-10-20 16:23:11 -07:00
Jeff King	3ec6167567	send-email: handle to/cc/bcc from --compose message If the user writes a message via --compose, send-email will pick up various headers like "From", "Subject", etc and use them for other patches as if they were specified on the command-line. But we don't handle "To", "Cc", or "Bcc" this way; we just tell the user "those aren't interpeted yet" and ignore them. But it seems like an obvious thing to want, especially as the same feature exists when the cover letter is generated separately by format-patch. There it is gated behind the --to-cover option, but I don't think we'd need the same control here; since we generate the --compose template ourselves based on the existing input, if the user leaves the lines unchanged then the behavior remains the same. So let's fill in the implementation; like those other headers we already handle, we just need to assign to the initial_* variables. The only difference in this case is that they are arrays, so we'll feed them through parse_address_line() to split them (just like we would when reading a single string via prompting). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-20 14:31:39 -07:00
Jeff King	637e8944a1	Revert "send-email: extract email-parsing code into a subroutine" This reverts commit `b6049542b9`. Prior to that commit, we read the results of the user editing the "--compose" message in a loop, picking out parts we cared about, and streaming the result out to a ".final" file. That commit split the reading/interpreting into two phases; we'd now read into a hash, and then pick things out of the hash. The goal was making the code more readable. And in some ways it did, because the ugly regexes are confined to the reading phase. But it also introduced several bugs, because now the two phases need to match each other. In particular: - we pick out headers like "Subject: foo" with a case-insensitive regex, and then use the user-provided header name as the key in a case-sensitive hash. So if the user wrote "subject: foo", we'd no longer recognize it as a subject. - the namespace for the hash keys conflates header names with meta information like "body". If you put "body: foo" in your message, it would be misinterpreted as the actual message body (nobody is likely to do that in practice, but it seems like an unnecessary danger). - the handling for to/cc/bcc is totally broken. The behavior before that commit is to recognize and skip those headers, with a note to the user that they are not yet handled. Not great, but OK. But after the patch, the reading side now splits the addresses into a perl array-ref. But the interpreting side doesn't handle this at all, and blindly prints the stringified array-ref value. This leads to garbage like: (mbox) Adding to: ARRAY (0x555b4345c428) from line 'To: ARRAY(0x555b4345c428)' error: unable to extract a valid address from: ARRAY (0x555b4345c428) What to do with this address? ([q]uit\|[d]rop\|[e]dit): Probably not a huge deal, since nobody should even try to use those headers in the first place (since they were not implemented). But the new behavior is worse, and indicative of the sorts of problems that come from having the two layers. The revert had a few conflicts, due to later work in this area from `15dc3b9161` (send-email: rename variable for clarity, 2018-03-04) and `d11c943c78` (send-email: support separate Reply-To address, 2018-03-04). I've ported the changes from those commits over as part of the conflict resolution. The new tests show the bugs. Note the use of GIT_SEND_EMAIL_NOTTY in the second one. Without it, the test is happy to reach outside the test harness to the developer's actual terminal (when run with the buggy state before this patch). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-20 14:31:32 -07:00
Kristoffer Haugsbakk	b1688ea02d	grep: die gracefully when outside repository Die gracefully when `git grep --no-index` is run outside of a Git repository and the path is outside the directory tree. If you are not in a Git repository and say: git grep --no-index search .. You trigger a `BUG`: BUG: environment.c:213: git environment hasn't been setup Aborted (core dumped) Because `..` is a valid path which is treated as a pathspec. Then `pathspec` figures out that it is not in the current directory tree. The `BUG` is triggered when `pathspec` tries to advise the user about how the path is not in the current (non-existing) repository. Reported-by: ks1322 ks1322 <ks1322@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-20 11:06:45 -07:00
Dorcas AnonoLitunya	5abb758118	t7601: use "test_path_is_file" etc. instead of "test -f" Some tests in t7601 use "test -f" and "test ! -f" to see if a path exists or is missing. Use test_path_is_file and test_path_is_missing helper functions to clarify these tests a bit better. This especially matters for the "missing" case because "test ! -f F" will be happy if "F" exists as a directory, but the intent of the test is that "F" should not exist, even as a directory. The updated code expresses this better. Signed-off-by: Dorcas AnonoLitunya <anonolitunya@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-18 16:57:49 -07:00
Junio C Hamano	7906b5c957	Merge branch 'jc/merge-ort-attr-index-fix' Fix "git merge-tree" to stop segfaulting when the --attr-source option is used. * jc/merge-ort-attr-index-fix: merge-ort: initialize repo in index state	2023-10-18 13:25:42 -07:00
Junio C Hamano	79861babe2	Merge branch 'tb/repack-max-cruft-size' "git repack" learned "--max-cruft-size" to prevent cruft packs from growing without bounds. * tb/repack-max-cruft-size: repack: free existing_cruft array after use builtin/repack.c: avoid making cruft packs preferred builtin/repack.c: implement support for `--max-cruft-size` builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE t7700: split cruft-related tests to t7704	2023-10-18 13:25:41 -07:00
Jeff King	7538f9d89b	t5319: make corrupted large-offset test more robust The test t5319.88 ("reader bounds-checks large offset table") can fail intermittently. The failure mode looks like this: 1. An earlier test sets up "objects64", a directory that can be used to produce a midx with a corrupted large-offsets table. To get the large offsets, it corrupts the normal ".idx" file to have a fake large offset, and then builds a midx from that. That midx now has a large offset table, which is what we want. But we also have a .idx on disk that has a corrupted entry. We'll call the object with the corrupted large-offset "X". 2. In t5319.88, we further corrupt the midx by reducing the size of the large-offset chunk (because our goal is to make sure we do not do an out-of-bounds read on it). 3. We then enumerate all of the objects with "cat-file --batch-check --batch-all-objects", expecting to see a complaint when we try to show object X. We use --batch-all-objects because our objects64 repo doesn't actually have any refs (but if we check them all, one of them will be the failing one). The default batch-check format includes %(objecttype) and %(objectsize), both of which require us to access the actual pack data (and thus requires looking at the offset). 4a. Usually, this succeeds. We try to output object X, do a lookup via the midx for the type/size lookup, and run into the corrupt large-offset table. 4b. But sometimes we hit a different error. If another object points to X as a delta base, then trying to find the type of that object requires walking the delta chain to the base entry (since only the base has the concrete type; deltas themselves are either OFS_DELTA or REF_DELTA). Normally this would not require separate offset lookups at all, as deltas are usually stored as OFS_DELTA, specifying the relative offset to the base. But the corrupt idx created in step 1 is done directly with "git pack-objects" and does not pass the --delta-base-offset option, meaning we have REF_DELTA entries! Those do have to consult an index to find the location of the base object, and they use the pack .idx to do this. The same pack .idx that we know is corrupted from step 1! Git does notice the error, but it does so by seeing the corrupt .idx file, not the corrupt midx file, and the error it reports is different, causing the test to fail. The set of objects created in the test is deterministic. But the delta selection seems not to be (which is not too surprising, as it is multi-threaded). I have seen the failure in Windows CI but haven't reproduced it locally (not even with --stress). Re-running a failed Windows CI job tends to work. But when I download and examine the trash directory from a failed run, it shows a different set of deltas than I get locally. But the exact source of non-determinism isn't that important; our test should be robust against any order. There are a few options to fix this: a. It would be OK for the "objects64" setup to "unbreak" the .idx file after generating the midx. But then it would be hard for subsequent tests to reuse it, since it is the corrupted idx that forces the midx to have a large offset table. b. The "objects64" setup could use --delta-base-offset. This would fix our problem, but earlier tests have many hard-coded offsets. Using OFS_DELTA would change the locations of objects in the pack (this might even be OK because I think most of the offsets are within the .idx file, but it seems brittle and I'm afraid to touch it). c. Our cat-file output is in oid order by default. Since we store bases before deltas, if we went in pack order (using the "--unordered" flag), we'd always see our corrupt X before any delta which depends on it. But using "--unordered" means we skip the midx entirely. That makes sense, since it is just enumerating all of the packs, using the offsets found in their .idx files directly. So it doesn't work for our test. d. We could ask directly about object X, rather than enumerating all of them. But that requires further hard-coding of the oid (both sha1 and sha256) of object X. I'd prefer not to introduce more brittleness. e. We can use a --batch-check format that looks at the pack data, but doesn't have to chase deltas. The problem in this case is %(objecttype), which has to walk to the base. But %(objectsize) does not; we can get the value directly from the delta itself. Another option would be %(deltabase), where we report the REF_DELTA name but don't look at its data. I've gone with option (e) here. It's kind of subtle, but it's simple and has no side effects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-14 10:17:25 -07:00
Junio C Hamano	2920971a7f	Merge branch 'jk/decoration-and-other-leak-fixes' Leakfix. * jk/decoration-and-other-leak-fixes: daemon: free listen_addr before returning revision: clear decoration structs during release_revisions() decorate: add clear_decoration() function	2023-10-13 14:18:28 -07:00
Junio C Hamano	09dcbb486d	Merge branch 'ar/diff-index-merge-base-fix' "git diff --merge-base X other args..." insisted that X must be a commit and errored out when given an annotated tag that peels to a commit, but we only need it to be a committish. This has been corrected. * ar/diff-index-merge-base-fix: diff: fix --merge-base with annotated tags	2023-10-13 14:18:28 -07:00
Junio C Hamano	b32f5b6b34	Merge branch 'js/submodule-fix-misuse-of-path-and-name' In .gitmodules files, submodules are keyed by their names, and the path to the submodule whose name is $name is specified by the submodule.$name.path variable. There were a few codepaths that mixed the name and path up when consulting the submodule database, which have been corrected. It took long for these bugs to be found as the name of a submodule initially is the same as its path, and the problem does not surface until it is moved to a different path, which apparently happens very rarely. * js/submodule-fix-misuse-of-path-and-name: t7420: test that we correctly handle renamed submodules t7419: test that we correctly handle renamed submodules t7419, t7420: use test_cmp_config instead of grepping .gitmodules t7419: actually test the branch switching submodule--helper: return error from set-url when modifying failed submodule--helper: use submodule_from_path in set-{url,branch}	2023-10-13 14:18:28 -07:00
Junio C Hamano	a45eddec40	Merge branch 'jk/commit-graph-leak-fixes' Leakfix. * jk/commit-graph-leak-fixes: commit-graph: clear oidset after finishing write commit-graph: free write-context base_graph_name during cleanup commit-graph: free write-context entries before overwriting commit-graph: free graph struct that was not added to chain commit-graph: delay base_graph assignment in add_graph_to_chain() commit-graph: free all elements of graph chain commit-graph: move slab-clearing to close_commit_graph() merge: free result of repo_get_merge_bases() commit-reach: free temporary list in get_octopus_merge_bases() t6700: mark test as leak-free	2023-10-13 14:18:28 -07:00
Junio C Hamano	c75e91499b	Merge branch 'la/trailer-test-and-doc-updates' Test coverage for trailers has been improved. * la/trailer-test-and-doc-updates: trailer doc: <token> is a <key> or <keyAlias>, not both trailer doc: separator within key suppresses default separator trailer doc: emphasize the effect of configuration variables trailer --unfold help: prefer "reformat" over "join" trailer --parse docs: add explanation for its usefulness trailer --only-input: prefer "configuration variables" over "rules" trailer --parse help: expose aliased options trailer --no-divider help: describe usual "---" meaning trailer: trailer location is a place, not an action trailer doc: narrow down scope of --where and related flags trailer: add tests to check defaulting behavior with --no-* flags trailer test description: this tests --where=after, not --where=before trailer tests: make test cases self-contained	2023-10-13 14:18:27 -07:00
Jason Hatton	5143ac07b1	Prevent git from rehashing 4GiB files The index stores file sizes using a uint32_t. This causes any file that is a multiple of 2^32 to have a cached file size of zero. Zero is a special value used by racily clean. This causes git to rehash every file that is a multiple of 2^32 every time git status or git commit is run. This patch mitigates the problem by making all files that are a multiple of 2^32 appear to have a size of 1<<31 instead of zero. The value of 1<<31 is chosen to keep it as far away from zero as possible to help prevent things getting mixed up with unpatched versions of git. An example would be to have a 2^32 sized file in the index of patched git. Patched git would save the file as 2^31 in the cache. An unpatched git would very much see the file has changed in size and force it to rehash the file, which is safe. The file would have to grow or shrink by exactly 2^31 and retain all of its ctime, mtime, and other attributes for old git to not notice the change. This patch does not change the behavior of any file that is not an exact multiple of 2^32. Signed-off-by: Jason D. Hatton <jhatton@globalfinishing.com> Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-13 13:33:35 -07:00
brian m. carlson	678eb55f5d	t: add a test helper to truncate files In a future commit, we're going to work with some large files which will be at least 4 GiB in size. To take advantage of the sparseness functionality on most Unix systems and avoid running the system out of disk, it would be convenient to use truncate(2) to simply create a sparse file of sufficient size. However, the GNU truncate(1) utility isn't portable, so let's write a tiny test helper that does the work for us. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-13 13:33:35 -07:00
John Cai	9f9c40cf34	attr: add attr.tree for setting the treeish to read attributes from `44451a2` (attr: teach "--attr-source=<tree>" global option to "git", 2023-05-06) provided the ability to pass in a treeish as the attr source. In the context of serving Git repositories as bare repos like we do at GitLab however, it would be easier to point --attr-source to HEAD for all commands by setting it once. Add a new config attr.tree that allows this. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-13 11:43:29 -07:00
John Cai	2386535511	attr: read attributes from HEAD when bare repo The motivation for `44451a2e5e` (attr: teach "--attr-source=<tree>" global option to "git" , 2023-05-06), was to make it possible to use gitattributes with bare repositories. To make it easier to read gitattributes in bare repositories however, let's just make HEAD:.gitattributes the default. This is in line with how mailmap works, `8c473cecfd` (mailmap: default mailmap.blob in bare repositories, 2012-12-13). Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-13 11:43:29 -07:00
Junio C Hamano	5b2424b658	grep: -f <path> is relative to $cwd Just like OPT_FILENAME() does, "git grep -f <path>" should treat the <path> relative to the original $cwd by paying attention to the prefix the command is given. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-12 10:41:59 -07:00
Junio C Hamano	d9b6634589	stash: be careful what we store "git stash store" is meant to store what "git stash create" produces, as these two are implementation details of the end-user facing "git stash save" command. Even though it is clearly documented as such, users would try silly things like "git stash store HEAD" to render their stash unusable. Worse yet, because "git stash drop" does not allow such a stash entry to be removed, "git stash clear" would be the only way to recover from such a mishap. Reuse the logic that allows "drop" to refrain from working on such a stash entry to teach "store" to avoid storing an object that is not a stash entry in the first place. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-11 16:27:30 -07:00
Junio C Hamano	1fdedb7c7d	Merge branch 'cc/repack-sift-filtered-objects-to-separate-pack' "git repack" machinery learns to pay attention to the "--filter=" option. * cc/repack-sift-filtered-objects-to-separate-pack: gc: add `gc.repackFilterTo` config option repack: implement `--filter-to` for storing filtered out objects gc: add `gc.repackFilter` config option repack: add `--filter=<filter-spec>` option pack-bitmap-write: rebuild using new bitmap when remapping repack: refactor finding pack prefix repack: refactor finishing pack-objects command t/helper: add 'find-pack' test-tool pack-objects: allow `--filter` without `--stdout`	2023-10-10 11:39:15 -07:00
Junio C Hamano	afb0d0880a	Merge branch 'ds/init-diffstat-width' Code clean-up. * ds/init-diffstat-width: diff --stat: set the width defaults in a helper function	2023-10-10 11:39:14 -07:00
Junio C Hamano	a7a2d10421	Merge branch 'cw/prelim-cleanup' Shuffle some bits across headers and sources to prepare for libification effort. * cw/prelim-cleanup: parse: separate out parsing functions from config.h config: correct bad boolean env value error message wrapper: reduce scope of remove_or_warn() hex-ll: separate out non-hash-algo functions	2023-10-10 11:39:14 -07:00
Jeff King	12192a9db9	commit-graph: detect out-of-order BIDX offsets The BIDX chunk tells us the offsets at which each commit's Bloom filters can be found in the BDAT chunk. We compute the length of each filter by checking the offsets of neighbors and subtracting them. If the offsets are out of order, then we'll get a negative length, which we then store as a very large unsigned value. This can cause us to read out-of-bounds memory, as we access the hash data modulo "filter->len * BITS_PER_WORD". We can easily detect this case when loading the individual filters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:02 -07:00
Jeff King	581e0f8b18	commit-graph: check bounds when accessing BIDX chunk We load the bloom_filter_indexes chunk using pair_chunk(), so we have no idea how big it is. This can lead to out-of-bounds reads if it is smaller than expected, since we index it based on the number of commits found elsewhere in the graph file. We can check the chunk size up front, like we do for CDAT and other chunks with one fixed-size record per commit. The test case demonstrates the problem. It actually won't segfault, because we end up reading random data from the follow-on chunk (BDAT in this case), and the bounds checks added in the previous patch complain. But this is by no means assured, and you can craft a commit-graph file with BIDX at the end (or a smaller BDAT) that does segfault. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	920f400e91	commit-graph: check bounds when accessing BDAT chunk When loading Bloom filters from a commit-graph file, we use the offset values in the BIDX chunk to index into the memory mapped for the BDAT chunk. But since we don't record how big the BDAT chunk is, we just trust that the BIDX offsets won't cause us to read outside of the chunk memory. A corrupted or malicious commit-graph file will cause us to segfault (in practice this isn't a very interesting attack, since commit-graph files are local-only, and the worst case is an out-of-bounds read). We can't fix this by checking the chunk size during parsing, since the data in the BDAT chunk doesn't have a fixed size (that's why we need the BIDX in the first place). So we'll fix it in two parts: 1. Record the BDAT chunk size during parsing, and then later check that the BIDX offsets we look up are within bounds. 2. Because the offsets are relative to the end of the BDAT header, we must also make sure that the BDAT chunk is at least as large as the expected header size. Otherwise, we overflow when trying to move past the header, even for an offset of "0". We can check this early, during the parsing stage. The error messages are rather verbose, but since this is not something you'd expect to see outside of severe bugs or corruption, it makes sense to err on the side of too many details. Sadly we can't mention the filename during the chunk-parsing stage, as we haven't set g->filename at this point, nor passed it down through the stack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	ee6a792412	commit-graph: bounds-check generation overflow chunk If the generation entry in a commit-graph doesn't fit, we instead insert an offset into a generation overflow chunk. But since we don't record the size of the chunk, we may read outside the chunk if the offset we find on disk is malicious or corrupted. We can't check the size of the chunk up-front; it will vary based on how many entries need overflow. So instead, we'll do a bounds-check before accessing the chunk memory. Unfortunately there is no error-return from this function, so we'll just have to die(), which is what it does for other forms of corruption. As with other cases, we can drop the st_mult() call, since we know our bounds-checked value will fit within a size_t. Before this patch, the test here actually "works" because we read garbage data from the next chunk. And since that garbage data happens not to provide a generation number which changes the output, it appears to work. We could construct a case that actually segfaults or produces wrong output, but it would be a bit tricky. For our purposes its sufficient to check that we've detected the bounds error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	4a3c34662b	commit-graph: check size of generations chunk We neither check nor record the size of the generations chunk we parse from a commit-graph file. This should have one uint32_t for each commit in the file; if it is smaller (due to corruption, etc), we may read outside the mapped memory. The included test segfaults without this patch, as it shrinks the size considerably (and the chunk is near the end of the file, so we read off the end of the array rather than accidentally reading another chunk). We can fix this by checking the size up front (like we do for other fixed-size chunks, like CDAT). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	6cf61d0db5	commit-graph: bounds-check base graphs chunk When we are loading a commit-graph chain, we check that each slice of the chain points to the appropriate set of base graphs via its BASE chunk. But since we don't record the size of the chunk, we may access out-of-bounds memory if the file is corrupted. Since we know the number of entries we expect to find (based on the position within the commit-graph-chain file), we can just check the size up front. In theory this would also let us drop the st_mult() call a few lines later when we actually access the memory, since we know that the computed offset will fit in a size_t. But because the operands "g->hash_len" and "n" have types "unsigned char" and "int", we'd have to cast to size_t first. Leaving the st_mult() does that cast, and makes it more obvious that we don't have an overflow problem. Note that the test does not actually segfault before this patch, since it just reads garbage from the chunk after BASE (and indeed, it even rejects the file because that garbage does not have the expected hash value). You could construct a file with BASE at the end that did segfault, but corrupting the existing one is easy, and we can check stderr for the expected message. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	9622610e55	commit-graph: detect out-of-bounds extra-edges pointers If an entry in a commit-graph file has more than 2 parents, the fixed-size parent fields instead point to an offset within an "extra edges" chunk. We blindly follow these, assuming that the chunk is present and sufficiently large; this can lead to an out-of-bounds read for a corrupt or malicious file. We can fix this by recording the size of the chunk and adding a bounds-check in fill_commit_in_graph(). There are a few tricky bits: 1. We'll switch from working with a pointer to an offset. This makes some corner cases just fall out naturally: a. If we did not find an EDGE chunk at all, our size will correctly be zero (so everything is "out of bounds"). b. Comparing "size / 4" lets us make sure we have at least 4 bytes to read, and we never compute a pointer more than one element past the end of the array (computing a larger pointer is probably OK in practice, but is technically undefined behavior). c. The current code casts to "uint32_t ". Replacing it with an offset avoids any comparison between different types of pointer (since the chunk is stored as "unsigned char "). 2. This is the first case in which fill_commit_in_graph() may return anything but success. We need to make sure to roll back the "parsed" flag (and any parents we might have added before running out of buffer) so that the caller can cleanly fall back to loading the commit object itself. It's a little non-trivial to do this, and we might benefit from factoring it out. But we can wait on that until we actually see a second case where we return an error. As a bonus, this lets us drop the st_mult() call. Since we've already done a bounds check, we know there won't be any integer overflow (it would imply our buffer is larger than a size_t can hold). The included test does not actually segfault before this patch (though you could construct a case where it does). Instead, it reads garbage from the next chunk which results in it complaining about a bogus parent id. This is sufficient for our needs, though (we care that the fallback succeeds, and that stderr mentions the out-of-bounds read). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	b72df612af	commit-graph: check size of commit data chunk We expect a commit-graph file to have a fixed-size data record for each commit in the file (and we know the number of commits to expct from the size of the lookup table). If we encounter a file where this is too small, we'll look past the end of the chunk (and possibly even off the mapped memory). We can fix this by checking the size up front when we record the pointer. The included test doesn't segfault, since it ends up reading bytes from another chunk. But it produces nonsense results, since the values it reads are garbage. Our test notices this by comparing the output to a non-corrupted run of the same command (and of course we also check that the expected error is printed to stderr). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	c0fe9b2da5	midx: check size of revindex chunk When we load a revindex from disk, we check the size of the file compared to the number of objects we expect it to have. But when we use a RIDX chunk stored directly in the midx, we just access the memory directly. This can lead to out-of-bounds memory access for a corrupted or malicious multi-pack-index file. We can catch this by recording the RIDX chunk size, and then checking it against the expected size when we "load" the revindex. Note that this check is much simpler than the one that load_revindex_from_disk() does, because we just have the data array with no header (so we do not need to account for the header size, and nor do we need to bother validating the header values). The test confirms both that we catch this case, and that we continue the process (the revindex is required to use the midx bitmaps, but we fallback to a non-bitmap traversal). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	2abd56e9b2	midx: bounds-check large offset chunk When we see a large offset bit in the regular midx offset table, we use the entry as an index into a separate large offset table (just like a pack idx does). But we don't bounds-check the access to that large offset table (nor even record its size when we parse the chunk!). The equivalent code for a regular pack idx is in check_pack_index_ptr(). But things are a bit simpler here because of the chunked format: we can just check our array index directly. As a bonus, we can get rid of the st_mult() here. If our array bounds-check is successful, then we know that the result will fit in a size_t (and the bounds check uses a division to avoid overflow entirely). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	0924869b4e	midx: check size of object offset chunk The object offset chunk has one fixed-size entry for each object in the midx. But since we don't check its size, we may access out-of-bounds memory if we see a corrupt or malicious midx file. Sine the entries are fixed-size, the total length can be known up-front, and we can just check it while parsing the chunk (this is similar to what we do when opening pack idx files, which contain a similar offset table). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	c9b9fefc13	midx: enforce chunk alignment on reading The midx reader assumes chunks are aligned to a 4-byte boundary: we treat the fanout chunk as an array of uint32_t, indexing it to feed the results to ntohl(). Without aligning the chunks, we may violate the CPU's alignment constraints. Though many platforms allow this, some do not. And certanily UBSan will complain, since it is undefined behavior. Even though most chunks are naturally 4-byte-aligned (because they are storing uint32_t or larger types), PNAM is not. It stores NUL-terminated pack names, so you can have a valid chunk with any length. The writing side handles this by 4-byte-aligning the chunk, introducing a few extra NULs as necessary. But since we don't check this on the reading side, we may end up with a misaligned fanout and trigger the undefined behavior. We have two options here: 1. Swap out ntohl(fanout[i]) for get_be32(fanout+i) everywhere. The latter handles alignment itself. It's possible that it's slightly slower (though in practice I'm not sure how true that is, especially for these code paths which then go on to do a binary search). 2. Enforce the alignment when reading the chunks. This is easy to do, since the table-of-contents reader can check it in one spot. I went with the second option here, just because it places less burden on maintenance going forward (it is OK to continue using ntohl), and we know it can't have any performance impact on the actual reads. The commit-graph code uses the same chunk API. It's usually also 4-byte aligned, but some chunks are not (like Bloom filter BDAT chunks). So we'll pass "1" here to allow any alignment. It doesn't suffer from the same problem as midx with its fanout because the fanout chunk is always the first (and the rest of the format dictates that the first chunk will start aligned). The new test shows the effect on a midx with a misaligned PNAM chunk. Note that the midx-reading code treats chunk-toc errors as soft, falling back to the non-midx path rather than calling die(), as we do for other parsing errors. Arguably we should make all of these behave the same, but that's out of scope for this patch. For now the test just expects the fallback behavior. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	72a9a08283	midx: check size of pack names chunk We parse the pack-name chunk as a series of NUL-terminated strings. But since we don't look at the chunk size, there's nothing to guarantee that we don't parse off the end of the chunk (or even off the end of the mapped file). We can record the length, and then as we parse make sure that we never walk past it. The new test exercises the case, though note that it does not actually segfault before this patch. It hits a NUL byte somewhere in one of the other chunks, and comes up with a garbage pack name. You could construct one that reads out-of-bounds (e.g., a PNAM chunk at the end of file), but this case is simple and sufficient to check that we detect the problem. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	4169d89645	commit-graph: check consistency of fanout table We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Jeff King	fc926567ed	midx: check size of oid lookup chunk When reading an on-disk multi-pack-index, we take the number of objects in the midx from the final value of the fanout table. But we just blindly assume that the chunk containing the actual oid entries is the correct size. This can lead to us reading out-of-bounds memory if the lookup chunk is too small (or if the fanout is corrupted; when they don't agree we cannot tell which one is wrong). Note that we bump the assignment of m->num_objects into the fanout parser callback, so that it's set when we parse the lookup table (otherwise we'd have to manually record the lookup table size and check it later). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Jeff King	52e2e8d43d	commit-graph: check size of oid fanout chunk We load the oid fanout chunk with pair_chunk(), which means we never see the size of the chunk. We just assume the on-disk file uses the appropriate size, and if it's too small we'll access random memory. It's easy to check this up-front; the fanout always consists of 256 uint32's, since it is a fanout of the first byte of the hash pointing into the oid index. These parameters can't be changed without introducing a new chunk type. This matches the similar check in the midx OIDF chunk (but note that rather than checking for the error immediately, the graph code just leaves parts of the struct NULL and checks for required fields later). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Jeff King	e3c9600397	midx: stop ignoring malformed oid fanout chunk When we load the oid-fanout chunk, our callback checks that its size is reasonable and returns an error if not. However, the caller only checks our return value against CHUNK_NOT_FOUND, so we end up ignoring the error completely! Using a too-small fanout table means we end up accessing random memory for the fanout and segfault. We can fix this by checking for any non-zero return value, rather than just CHUNK_NOT_FOUND, and adjusting our error message to cover both cases. We could handle each error code individually, but there's not much point for such a rare case. The extra message produced in the callback makes it clear what is going on. The same pattern is used in the adjacent code. Those cases are actually OK for now because they do not use a custom callback, so the only error they can get is CHUNK_NOT_FOUND. But let's convert them, as this is an accident waiting to happen (especially as we convert some of them away from pair_chunk). The error messages are more verbose, but it should be rare for a user to see these anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Jeff King	86b008ee61	t: add library for munging chunk-format files When testing corruption of files using the chunk format (like commit-graphs and midx files), it's helpful to be able to modify bytes in specific chunks. This requires being able both to read the table-of-contents (to find the chunk to modify) but also to adjust it (to account for size changes in the offsets of subsequent chunks). We have some tests already which corrupt chunk files, but they have some downsides: 1. They are very brittle, as they manually compute the expected size of a particular instance of the file (e.g., see the definitions starting with NUM_OBJECTS in t5319). 2. Because they rely on manual offsets and don't read the table-of-contents, they're limited to overwriting bytes. But there are many interesting corruptions that involve changing the sizes of chunks (especially smaller-than-expected ones). This patch adds a perl script which makes such corruptions easy. We'll use it in subsequent patches. Note that we could get by with just a big "perl -e" inside the helper function. I chose to put it in a separate script for two reasons. One, so we don't have to worry about the extra layer of shell quoting. And two, the script is kind of big, and running the tests with "-x" would repeatedly dump it into the log output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Victoria Dye	5305474ec4	ref-cache.c: fix prefix matching in ref iteration Update 'cache_ref_iterator_advance' to skip over refs that are not matched by the given prefix. Currently, a ref entry is considered "matched" if the entry name is fully contained within the prefix: * prefix: "refs/heads/v1" * entry: "refs/heads/v1.0" OR if the prefix is fully contained in the entry name: * prefix: "refs/heads/v1.0" * entry: "refs/heads/v1" The first case is always correct, but the second is only correct if the ref cache entry is a directory, for example: * prefix: "refs/heads/example" * entry: "refs/heads/" Modify the logic in 'cache_ref_iterator_advance' to reflect these expectations: 1. If 'overlaps_prefix' returns 'PREFIX_EXCLUDES_DIR', then the prefix and ref cache entry do not overlap at all. Skip this entry. 2. If 'overlaps_prefix' returns 'PREFIX_WITHIN_DIR', then the prefix matches inside this entry if it is a directory. Skip if the entry is not a directory, otherwise iterate over it. 3. Otherwise, 'overlaps_prefix' returned 'PREFIX_CONTAINS_DIR', indicating that the cache entry (directory or not) is fully contained by or equal to the prefix. Iterate over this entry. Note that condition 2 relies on the names of directory entries having the appropriate trailing slash. The existing function documentation of 'create_dir_entry' explicitly calls out the trailing slash requirement, so this is a safe assumption to make. This bug generally doesn't have any user-facing impact, since it requires: 1. using a non-empty prefix without a trailing slash in an iteration like 'for_each_fullref_in', 2. the callback to said iteration not reapplying the original filter (as for-each-ref does) to ensure unmatched refs are skipped, and 3. the repository having one or more refs that match part of, but not all of, the prefix. However, there are some niche scenarios that meet those criteria (specifically, 'rev-parse --bisect' and '(log\|show\|shortlog) --bisect'). Add tests covering those cases to demonstrate the fix in this patch. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:53:13 -07:00
John Cai	e95bafc52f	merge-ort: initialize repo in index state initialize_attr_index() does not initialize the repo member of attr_index. Starting in `44451a2e5e` (attr: teach "--attr-source=<tree>" global option to "git", 2023-05-06), this became a problem because istate->repo gets passed down the call chain starting in git_check_attr(). This gets passed all the way down to replace_refs_enabled(), which segfaults when accessing r->gitdir. Fix this by initializing the repository in the index state. Signed-off-by: John Cai <johncai86@gmail.com> Helped-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 14:42:02 -07:00
Sergey Organov	c8e5cb0658	diff-merges: introduce '--dd' option This option provides a shortcut to request diff with respect to first parent for any kind of commit, universally. It's implemented as pure synonym for "--diff-merges=first-parent --patch". Gives user quick and universal way to see what changes, exactly, were brought to a branch by merges as well as by regular commits. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 12:47:29 -07:00
Andy Koppe	2b09d16aba	pretty: fix ref filtering for %(decorate) formats Mark pretty formats containing "%(decorate" as requiring decoration in userformat_find_requirements(), same as "%d" and "%D". Without this, cmd_log_init_finish() didn't invoke load_ref_decorations() with the decoration_filter it puts together, and hence filtering options such as --decorate-refs were quietly ignored. Amend one of the %(decorate) checks in t4205-log-pretty-formats.sh to test this. Signed-off-by: Andy Koppe <andy.koppe@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 11:25:13 -07:00
Jeff King	badf2fe1c3	daemon: free listen_addr before returning We build up a string list of listen addresses from the command-line arguments, but never free it. This causes t5811 to complain of a leak (though curiously it seems to do so only when compiled with gcc, not with clang). To handle this correctly, we have to do a little refactoring: - there are two exit points from the main function, depending on whether we are entering the main loop or serving a single client (since rather than a traditional fork model, we re-exec ourselves with the extra "--serve" argument to accommodate Windows). We don't need --listen at all in the --serve case, of course, but it is passed along by the parent daemon, which simply copies all of the command-line options it got. - we just "return serve()" to run the main loop, giving us no chance to do any cleanup So let's use a "ret" variable to store the return code, and give ourselves a single exit point at the end. That gives us one place to do cleanup. Note that this code also uses the "use a no-dup string-list, but allocate strings we add to it" trick, meaning string_list_clear() will not realize it should free them. We can fix this by switching to a "dup" string-list, but using the "append_nodup" function to add to it (this is preferable to tweaking the strdup_strings flag before clearing, as it puts all the subtle memory-ownership code together). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 14:54:58 -07:00
Jeff King	8ef8da4842	revision: clear decoration structs during release_revisions() The point of release_revisions() is to free memory associated with the rev_info struct, but we have several "struct decoration" members that are left untouched. Since the previous commit introduced a function to do that, we can just call it. We do have to provide some specialized callbacks to map the void pointers onto real ones (the alternative would be casting the existing function pointers; this generally works because "void *" is usually interchangeable with a struct pointer, but it is technically forbidden by the standard). Since the line-log code does not expose the type it stores in the decoration (nor of course the function to free it), I put this behind a generic line_log_free() entry point. It's possible we may need to add more line-log specific bits anyway (running t4211 shows a number of other leaks in the line-log code). While this doubtless cleans up many leaks triggered by the test suite, the only script which becomes leak-free is t4217, as it does very little beyond a simple traversal (its existing leak was from the use of --children, which is now fixed). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 14:54:57 -07:00
Jeff King	771868243c	decorate: add clear_decoration() function There's not currently any way to free the resources associated with a decoration struct. As a result, we have several memory leaks which cannot easily be plugged. Let's add a "clear" function and make use of it in the example code of t9004. This removes the only leak from that script, so we can mark it as passing the leak sanitizer. Curiously this leak is found only when running SANITIZE=leak with clang, but not with gcc. But it is a bog-standard leak: we allocate some memory in a local variable struct, and then exit main() without releasing it. I'm not sure why gcc doesn't find it. After this patch, both compilers report it as leak-free. Note that the clear function takes a callback to free the individual entries. That's not needed for our example (which is just decorating with ints), but will be for real callers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 14:54:55 -07:00
Taylor Blau	3c1e2c2113	builtin/repack.c: avoid making cruft packs preferred When doing a `--geometric` repack, we make sure that the preferred pack (if writing a MIDX) is the largest pack that we didn't repack. That has the effect of keeping the preferred pack in sync with the pack containing a majority of the repository's reachable objects. But if the repository happens to double in size, we'll repack everything. Here we don't specify any `--preferred-pack`, and instead let the MIDX code choose. In the past, that worked fine, since there would only be one pack to choose from: the one we just wrote. But it's no longer necessarily the case that there is one pack to choose from. It's possible that the repository also has a cruft pack, too. If the cruft pack happens to come earlier in lexical order (and has an earlier mtime than any non-cruft pack), we'll pick that pack as preferred. This makes it impossible to reuse chunks of the reachable pack verbatim from pack-objects, so is sub-optimal. Luckily, this is a somewhat rare circumstance to be in, since we would have to repack the entire repository during a `--geometric` repack, and the cruft pack would have to sort ahead of the pack we just created. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 13:26:11 -07:00
Taylor Blau	37dc6d8104	builtin/repack.c: implement support for `--max-cruft-size` Cruft packs are an alternative mechanism for storing a collection of unreachable objects whose mtimes are recent enough to avoid being pruned out of the repository. When cruft packs were first introduced back in `b757353676` (builtin/pack-objects.c: --cruft without expiration, 2022-05-20) and `a7d493833f` (builtin/pack-objects.c: --cruft with expiration, 2022-05-20), the recommended workflow consisted of: - Repacking periodically, either by packing anything loose in the repository (via `git repack -d`) or producing a geometric sequence of packs (via `git repack --geometric=<d> -d`). - Every so often, splitting the repository into two packs, one cruft to store the unreachable objects, and another non-cruft pack to store the reachable objects. Repositories may (out of band with the above) choose periodically to prune out some unreachable objects which have aged out of the grace period by generating a pack with `--cruft-expiration=<approxidate>`. This allowed repositories to maintain relatively few packs on average, and quarantine unreachable objects together in a cruft pack, avoiding the pitfalls of holding unreachable objects as loose while they age out (for more, see some of the details in `3d89a8c118` (Documentation/technical: add cruft-packs.txt, 2022-05-20)). This all works, but can be costly from an I/O-perspective when frequently repacking a repository that has many unreachable objects. This problem is exacerbated when those unreachable objects are rarely (if every) pruned. Since there is at most one cruft pack in the above scheme, each time we update the cruft pack it must be rewritten from scratch. Because much of the pack is reused, this is a relatively inexpensive operation from a CPU-perspective, but is very costly in terms of I/O since we end up rewriting basically the same pack (plus any new unreachable objects that have entered the repository since the last time a cruft pack was generated). At the time, we decided against implementing more robust support for multiple cruft packs. This patch implements that support which we were lacking. Introduce a new option `--max-cruft-size` which allows repositories to accumulate cruft packs up to a given size, after which point a new generation of cruft packs can accumulate until it reaches the maximum size, and so on. To generate a new cruft pack, the process works like so: - Sort a list of any existing cruft packs in ascending order of pack size. - Starting from the beginning of the list, group cruft packs together while the accumulated size is smaller than the maximum specified pack size. - Combine the objects in these cruft packs together into a new cruft pack, along with any other unreachable objects which have since entered the repository. Once a cruft pack grows beyond the size specified via `--max-cruft-size` the pack is effectively frozen. This limits the I/O churn up to a quadratic function of the value specified by the `--max-cruft-size` option, instead of behaving quadratically in the number of total unreachable objects. When pruning unreachable objects, we bypass the new code paths which combine small cruft packs together, and instead start from scratch, passing in the appropriate `--max-pack-size` down to `pack-objects`, putting it in charge of keeping the resulting set of cruft packs sized correctly. This may seem like further I/O churn, but in practice it isn't so bad. We could prune old cruft packs for whom all or most objects are removed, and then generate a new cruft pack with just the remaining set of objects. But this additional complexity buys us relatively little, because most objects end up being pruned anyway, so the I/O churn is well contained. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 13:26:11 -07:00
Štěpán Němec	f0a39ba504	t/README: fix multi-prerequisite example With the broken quoting the test wouldn't even parse correctly, but there's also the '==' instead of POSIX '=' (of the shells I tested, busybox ash, bash and ksh (93 and OpenBSD) accept '==', dash and zsh do not), and 'print 2' from Python 2 days. (I assume the test failing due to 3 != 4 is intentional or immaterial.) Fixes: `93a5724613` ("test-lib: Add support for multiple test prerequisites") Signed-off-by: Štěpán Němec <stepnem@smrk.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 12:55:38 -07:00
Štěpán Němec	97509a3497	doc: fix some typos, grammar and wording issues Signed-off-by: Štěpán Němec <stepnem@smrk.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-05 12:55:38 -07:00
Junio C Hamano	c3c0020673	Merge branch 'jk/commit-graph-verify-fix' Various fixes to "git commit-graph verify". * jk/commit-graph-verify-fix: commit-graph: report incomplete chains during verification commit-graph: tighten chain size check commit-graph: detect read errors when verifying graph chain t5324: harmonize sha1/sha256 graph chain corruption commit-graph: check mixed generation validation when loading chain file commit-graph: factor out chain opening function	2023-10-04 13:28:53 -07:00
Junio C Hamano	42b495e9c5	Merge branch 'ks/ref-filter-mailmap' "git for-each-ref" and friends learn to apply mailmap to authorname and other fields. * ks/ref-filter-mailmap: ref-filter: add mailmap support t/t6300: introduce test_bad_atom t/t6300: cleanup test_atom	2023-10-04 13:28:53 -07:00
Junio C Hamano	3029189186	Merge branch 'ps/revision-cmdline-stdin-not' "git rev-list --stdin" learned to take non-revisions (like "--not") recently from the standard input, but the way such a "--not" was handled was quite confusing, which has been rethought. This is potentially a change that breaks backward compatibility. * ps/revision-cmdline-stdin-not: revision: make pseudo-opt flags read via stdin behave consistently	2023-10-04 13:28:52 -07:00
Jan Alexander Steffens (heftig)	bd1c20ccd7	t7420: test that we correctly handle renamed submodules Create a second submodule with a name that differs from its path. Test that calling set-url modifies the correct .gitmodules entries. Make sure we don't create a section named after the path instead of the name. Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 15:32:37 -07:00
Jan Alexander Steffens (heftig)	32bff3675e	t7419: test that we correctly handle renamed submodules Add the submodule again with an explicitly different name and path. Test that calling set-branch modifies the correct .gitmodules entries. Make sure we don't create a section named after the path instead of the name. Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 15:32:34 -07:00
Jan Alexander Steffens (heftig)	5fc880632d	t7419, t7420: use test_cmp_config instead of grepping .gitmodules We have a test function to verify config files. Use it as it's more precise. Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 15:32:31 -07:00
Jan Alexander Steffens (heftig)	b027fb0784	t7419: actually test the branch switching The submodule repo the test set up had the 'topic' branch checked out, meaning the repo's default branch (HEAD) is the 'topic' branch. The following tests then pretended to switch between the default branch and the 'topic' branch. This was papered over by continually adding commits to the 'topic' branch and checking if the submodule gets updated to this new commit. Return the submodule repo to the 'main' branch after setup so we can actually test the switching behavior. Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 15:32:27 -07:00
Jeff King	da09e7af68	commit-graph: clear oidset after finishing write In graph_write() we store commits in an oidset, but never clean it up, leaking the contents. We should clear it in the cleanup section. The oidset comes from `6830c36077` (commit-graph.h: replace 'commit_hex' with 'commits', 2020-04-13), but it was just replacing a string_list that was also leaked. Curiously, we fixed the leak of some adjacent variables in commit `fa8953cb40` (builtin/commit-graph.c: extract 'read_one_commit()', 2020-05-18), but the oidset wasn't included for some reason. In combination with the preceding commits, this lets us mark t5324 as leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:24 -07:00
Jeff King	d9c84c6d67	commit-graph: free write-context base_graph_name during cleanup Commit `6c622f9f0b` (commit-graph: write commit-graph chains, 2019-06-18) added a base_graph_name string to the write_commit_graph_context struct. But the end-of-function cleanup forgot to free it, causing a leak. This (presumably in combination with the preceding leak-fixes) lets us mark t5328 as leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:24 -07:00
Jeff King	716a6b2c3a	merge: free result of repo_get_merge_bases() We call repo_get_merge_bases(), which allocates a commit_list, but never free the result, causing a leak. The obvious solution is to free it, but we need to look at the contents of the first item to decide whether to leave the loop. One option is to free it in both code paths. But since the commit that the list points to is longer-lived than the list itself, we can just dereference it immediately, free the list, and then continue with the existing logic. This is about the same amount of code, but keeps the list management all in one place. This lets us mark a number of merge-related test scripts as leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:24 -07:00
Jeff King	ec97ad120c	commit-reach: free temporary list in get_octopus_merge_bases() We loop over the set of commits to merge, and for each one compute the merge base against the existing set of merge base candidates we've found. Then we replace the candidate set with a simple assignment of the list head, leaking the old list. We should free it first before assignment. This makes t5521 leak-free, so mark it as such. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:23 -07:00
Jeff King	be4b578c69	t6700: mark test as leak-free This test has never leaked since it was added. Let's annotate it to make sure it stays that way (and to reduce noise when looking for other leak-free scripts after we fix some leaks). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:23 -07:00
Taylor Blau	78de1c6c32	t7700: split cruft-related tests to t7704 A small handful of the tests in t7700 (the main script for testing functionality of 'git repack') are specifically related to cruft pack operations. Prepare for adding new cruft pack-related tests by moving the existing set into a new test script. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 18:28:47 -07:00
Christian Couder	9b96046b92	gc: add `gc.repackFilterTo` config option A previous commit implemented the `gc.repackFilter` config option to specify a filter that should be used by `git gc` when performing repacks. Another previous commit has implemented `git repack --filter-to=<dir>` to specify the location of the packfile containing filtered out objects when using a filter. Let's implement the `gc.repackFilterTo` config option to specify that location in the config when `gc.repackFilter` is used. Now when `git gc` will perform a repack with a <dir> configured through this option and not empty, the repack process will be passed a corresponding `--filter-to=<dir>` argument. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:54:31 -07:00
Christian Couder	71c5aec1f5	repack: implement `--filter-to` for storing filtered out objects A previous commit has implemented `git repack --filter=<filter-spec>` to allow users to filter out some objects from the main pack and move them into a new different pack. It would be nice if this new different pack could be created in a different directory than the regular pack. This would make it possible to move large blobs into a pack on a different kind of storage, for example cheaper storage. Even in a different directory, this pack can be accessible if, for example, the Git alternates mechanism is used to point to it. In fact not using the Git alternates mechanism can corrupt a repo as the generated pack containing the filtered objects might not be accessible from the repo any more. So setting up the Git alternates mechanism should be done before using this feature if the user wants the repo to be fully usable while this feature is used. In some cases, like when a repo has just been cloned or when there is no other activity in the repo, it's Ok to setup the Git alternates mechanism afterwards though. It's also Ok to just inspect the generated packfile containing the filtered objects and then just move it into the '.git/objects/pack/' directory manually. That's why it's not necessary for this command to check that the Git alternates mechanism has been already setup. While at it, as an example to show that `--filter` and `--filter-to` work well with other options, let's also add a test to check that these options work well with `--max-pack-size`. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:54:31 -07:00
Christian Couder	1cd43a9ed9	gc: add `gc.repackFilter` config option A previous commit has implemented `git repack --filter=<filter-spec>` to allow users to filter out some objects from the main pack and move them into a new different pack. Users might want to perform such a cleanup regularly at the same time as they perform other repacks and cleanups, so as part of `git gc`. Let's allow them to configure a <filter-spec> for that purpose using a new gc.repackFilter config option. Now when `git gc` will perform a repack with a <filter-spec> configured through this option and not empty, the repack process will be passed a corresponding `--filter=<filter-spec>` argument. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:54:30 -07:00
Christian Couder	48a9b67b43	repack: add `--filter=<filter-spec>` option This new option puts the objects specified by `<filter-spec>` into a separate packfile. This could be useful if, for example, some blobs take up a lot of precious space on fast storage while they are rarely accessed. It could make sense to move them into a separate cheaper, though slower, storage. It's possible to find which new packfile contains the filtered out objects using one of the following: - `git verify-pack -v ...`, - `test-tool find-pack ...`, which a previous commit added, - `--filter-to=<dir>`, which a following commit will add to specify where the pack containing the filtered out objects will be. This feature is implemented by running `git pack-objects` twice in a row. The first command is run with `--filter=<filter-spec>`, using the specified filter. It packs objects while omitting the objects specified by the filter. Then another `git pack-objects` command is launched using `--stdin-packs`. We pass it all the previously existing packs into its stdin, so that it will pack all the objects in the previously existing packs. But we also pass into its stdin, the pack created by the previous `git pack-objects --filter=<filter-spec>` command as well as the kept packs, all prefixed with '^', so that the objects in these packs will be omitted from the resulting pack. The result is that only the objects filtered out by the first `git pack-objects` command are in the pack resulting from the second `git pack-objects` command. As the interactions with kept packs are a bit tricky, a few related tests are added. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:54:30 -07:00
Christian Couder	66589f89ab	t/helper: add 'find-pack' test-tool In a following commit, we will make it possible to separate objects in different packfiles depending on a filter. To make sure that the right objects are in the right packs, let's add a new test-tool that can display which packfile(s) a given object is in. Let's also make it possible to check if a given object is in the expected number of packfiles with a `--check-count <n>` option. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:54:29 -07:00
Christian Couder	6cfcabfb9f	pack-objects: allow `--filter` without `--stdout` `9535ce7337` (pack-objects: add list-objects filtering, 2017-11-21) taught `git pack-objects` to use `--filter`, but required the use of `--stdout` since a partial clone mechanism was not yet in place to handle missing objects. Since then, changes like `9e27beaa23` (promisor-remote: implement promisor_remote_get_direct(), 2019-06-25) and others added support to dynamically fetch objects that were missing. Even without a promisor remote, filtering out objects can also be useful if we can put the filtered out objects in a separate pack, and in this case it also makes sense for pack-objects to write the packfile directly to an actual file rather than on stdout. Remove the `--stdout` requirement when using `--filter`, so that in a follow-up commit, repack can pass `--filter` to pack-objects to omit certain objects from the resulting packfile. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:54:29 -07:00
Alyssa Ross	4adceb5a29	diff: fix --merge-base with annotated tags Checking early for OBJ_COMMIT excludes other objects that can be resolved to commits, like annotated tags. If we remove it, annotated tags will be resolved and handled just fine by lookup_commit_reference(), and if we are given something that can't be resolved to a commit, we'll still get a useful error message, e.g.: > error: object 21ab162211ac3ef13c37603ca88b27e9c7e0d40b is a tree, not a commit > fatal: no merge base found Signed-off-by: Alyssa Ross <hi@alyssa.is> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 11:55:42 -07:00
Junio C Hamano	5bb67fb7ab	Merge branch 'jc/unresolve-removal' "checkout --merge -- path" and "update-index --unresolve path" did not resurrect conflicted state that was resolved to remove path, but now they do. * jc/unresolve-removal: checkout: allow "checkout -m path" to unmerge removed paths checkout/restore: add basic tests for --merge checkout/restore: refuse unmerging paths unless checking out of the index update-index: remove stale fallback code for "--unresolve" update-index: use unmerge_index_entry() to support removal resolve-undo: allow resurrecting conflicted state that resolved to deletion update-index: do not read HEAD and MERGE_HEAD unconditionally	2023-10-02 11:20:00 -07:00
Dragan Simic	4ca7a3fd26	diff --stat: set the width defaults in a helper function Extract the commonly used initialization of the --stat-width=<width>, --stat-name-width=<width> and --stat-graph-with=<width> parameters to their internal default values into a helper function, to avoid repeating the same initialization code in a few places. Add a couple of tests to additionally cover existing configuration options diff.statNameWidth=<width> and diff.statGraphWidth=<width> when used by git-merge to generate --stat outputs. This closes the gap that existed previously in the --stat tests, and reduces the chances for having any regressions introduced by this commit. While there, perform a small bunch of minor wording tweaks in the improved unit test, to improve its test-level consistency a bit. Signed-off-by: Dragan Simic <dsimic@manjaro.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-29 15:46:06 -07:00
Calvin Wan	b1bda75173	parse: separate out parsing functions from config.h The files config.{h,c} contain functions that have to do with parsing, but not config. In order to further reduce all-in-one headers, separate out functions in config.c that do not operate on config into its own file, parse.h, and update the include directives in the .c files that need only such functions accordingly. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-29 15:14:57 -07:00
Junio C Hamano	a4eebfadf2	Merge branch 'jk/test-pass-ubsan-options-to-http-test' UBSAN options were not propagated through the test framework to git run via the httpd, unlike ASAN options, which has been corrected. * jk/test-pass-ubsan-options-to-http-test: test-lib: set UBSAN_OPTIONS to match ASan	2023-09-29 09:04:16 -07:00
Junio C Hamano	d15f92e379	Merge branch 'jc/alias-completion' The command line completion script (in contrib/) can be told to complete aliases by including ": git <cmd> ;" in the alias to tell it that the alias should be completed similar to how "git <cmd>" is completed. The parsing code for the alias as been loosened to allow ';' without an extra space before it. * jc/alias-completion: completion: loosen and document the requirement around completing alias	2023-09-29 09:04:15 -07:00
Junio C Hamano	5cd3f68add	Merge branch 'kh/range-diff-notes' "git range-diff --notes=foo" compared "log --notes=foo --notes" of the two ranges, instead of using just the specified notes tree. * kh/range-diff-notes: range-diff: treat notes like `log`	2023-09-29 09:04:15 -07:00
Junio C Hamano	0b493d2986	Merge branch 'ds/stat-name-width-configuration' "git diff" learned diff.statNameWidth configuration variable, to give the default width for the name part in the "--stat" output. * ds/stat-name-width-configuration: diff --stat: add config option to limit filename width	2023-09-29 09:04:15 -07:00
Junio C Hamano	2affeb3cb5	Merge branch 'jk/fsmonitor-unused-parameter' Unused parameters in fsmonitor related code paths have been marked as such. * jk/fsmonitor-unused-parameter: run-command: mark unused parameters in start_bg_wait callbacks fsmonitor: mark unused hashmap callback parameters fsmonitor/darwin: mark unused parameters in system callback fsmonitor: mark unused parameters in stub functions fsmonitor/win32: mark unused parameter in fsm_os__incompatible() fsmonitor: mark some maybe-unused parameters fsmonitor/win32: drop unused parameters fsmonitor: prefer repo_git_path() to git_pathdup()	2023-09-29 09:04:14 -07:00
Jeff King	5f259197ee	commit-graph: report incomplete chains during verification The load_commit_graph_chain_fd_st() function will stop loading chains when it sees an error. But if it has loaded any graph slice at all, it will return it. This is a good thing for normal use (we use what data we can, and this is just an optimization). But it's a bad thing for "commit-graph verify", which should be careful about finding any irregularities. We do complain to stderr with a warning(), but the verify command still exits with a successful return code. The new tests here cover corruption of both the base and tip slices of the chain. The corruption of the base file already works (it is the first file we look at, so when we see the error we return NULL). The "tip" case is what is fixed by this patch (it complains to stderr but still returns the base slice). Likewise the existing tests for corruption of the commit-graph-chain file itself need to be updated. We already exited non-zero correctly for the "base" case, but the "tip" case can now do so, too. Note that this also causes us to adjust a test later in the file that similarly corrupts a tip (though confusingly the test script calls this "base"). It checks stderr but erroneously expects the whole "verify" command to exit with a successful code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-28 07:00:43 -07:00
Jeff King	7754a565e2	commit-graph: tighten chain size check When we open a commit-graph-chain file, if it's smaller than a single entry, we just quietly treat that as ENOENT. That make some sense if the file is truly zero bytes, but it means that "commit-graph verify" will quietly ignore a file that contains garbage if that garbage happens to be short. Instead, let's only simulate ENOENT when the file is truly empty, and otherwise return EINVAL. The normal graph-loading routines don't care, but "commit-graph verify" will notice and complain about the difference. It's not entirely clear to me that the 0-is-ENOENT case actually happens in real life, so we could perhaps just eliminate this special-case altogether. But this is how we've always behaved, so I'm preserving it in the name of backwards compatibility (though again, it really only matters for "verify", as the regular routines are happy to load what they can). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-28 07:00:43 -07:00
Jeff King	47d06bb010	commit-graph: detect read errors when verifying graph chain Because it's OK to not have a graph file at all, the graph_verify() function needs to tell the difference between a missing file and a real error. So when loading a traditional graph file, we call open_commit_graph() separately from load_commit_graph_chain_fd_st(), and don't complain if the first one fails with ENOENT. When the function learned about chain files in `3da4b609bb` (commit-graph: verify chains with --shallow mode, 2019-06-18), we couldn't be as careful, since the only way to load a chain was with read_commit_graph_one(), which did both the open/load as a single unit. So we'll miss errors in chain files we load, thinking instead that there was just no chain file at all. Note that we do still report some of these problems to stderr, as the loading function calls error() and warning(). But we'd exit with a successful exit code, which is wrong. We can fix that by using the recently split open/load functions for chains. That lets us treat the chain file just like a single file with respect to error handling here. An existing test (from `3da4b609bb`) shows off the problem; we were expecting "commit-graph verify" to report success, but that makes no sense. We did not even verify the contents of the graph data, because we couldn't load it! I don't think this was an intentional exception, but rather just the test covering what happened to occur. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-28 07:00:43 -07:00
Jeff King	2d45710c5d	t5324: harmonize sha1/sha256 graph chain corruption In t5324.20, we corrupt a hex character 60 bytes into the graph chain file. Since the file consists of two hash identifiers, one per line, the corruption differs between sha1 and sha256. In a sha1 repository, the corruption is on the second line, and in a sha256 repository, it is on the first. We should of course detect the problem with either line. But as the next few patches will show (and fix), that is not the case (in fact, we currently do not exit non-zero for either line!). And while at the end of our series we'll catch all errors, our intermediate states will have differing behavior between the two hashes. Let's make sure we test corruption of both the first and second lines, and do so consistently with either hash by choosing offsets which are always in the first hash (30 bytes) or in the second (70). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-28 07:00:43 -07:00
Kousik Sanagavarapu	a3d2e83a17	ref-filter: add mailmap support Add mailmap support to ref-filter formats which are similar in pretty. This support is such that the following pretty placeholders are equivalent to the new ref-filter atoms: %aN = authorname:mailmap %cN = committername:mailmap %aE = authoremail:mailmap %aL = authoremail:mailmap,localpart %cE = committeremail:mailmap %cL = committeremail:mailmap,localpart Additionally, mailmap can also be used with ":trim" option for email by doing something like "authoremail:mailmap,trim". The above also applies for the "tagger" atom, that is, "taggername:mailmap", "taggeremail:mailmap", "taggeremail:mailmap,trim" and "taggername:mailmap,localpart". The functionality of ":trim" and ":localpart" remains the same. That is, ":trim" gives the email, but without the angle brackets and ":localpart" gives the part of the email before the '@' character (if such a character is not found then we directly grab everything between the angle brackets). Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-25 14:52:34 -07:00
Kousik Sanagavarapu	0144f0de77	t/t6300: introduce test_bad_atom Introduce a new function "test_bad_atom", which is similar to "test_atom()" but should be used to check whether the correct error message is shown on stderr. Like "test_atom", the new function takes three arguments. The three arguments specify the ref, the format and the expected error message respectively, with an optional fourth argument for tweaking "test_expect_*" (which is by default "success"). Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-25 14:52:33 -07:00
Kousik Sanagavarapu	04830eb762	t/t6300: cleanup test_atom Previously, when the executable part of "test_expect_{success,failure}" (inside "test_atom") got "eval"ed, it would have been syntactically incorrect if the second argument ($2, which is the format) to "test_atom" were enclosed in single quotes because the $variables would get interpolated even before the arguments to "test_expect_{success,failure}" are formed. So fix this and also some style issues along the way. Helped-by: Junio C Hamano <gitster@pobox.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-25 14:52:33 -07:00
Tang Yuyi	6a4c9e7b32	merge-tree: add -X strategy option Add merge strategy option to produce more customizable merge result such as automatically resolving conflicts. Signed-off-by: Tang Yuyi <winglovet@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-25 14:37:42 -07:00
Patrick Steinhardt	f97c8b1e00	revision: make pseudo-opt flags read via stdin behave consistently When reading revisions from stdin via git-rev-list(1)'s `--stdin` option then these revisions never honor flags like `--not` which have been passed on the command line. Thus, an invocation like e.g. `git rev-list --all --not --stdin` will not treat all revisions read from stdin as uninteresting. While this behaviour may be surprising to a user, it's been this way ever since it has been introduced via `42cabc341c` (Teach rev-list an option to read revs from the standard input., 2006-09-05). With that said, in `c40f0b7877` (revision: handle pseudo-opts in `--stdin` mode, 2023-06-15) we have introduced a new mode to read pseudo opts from standard input where this behaviour is a lot more confusing. If you pass `--not` via stdin, it will: - Influence subsequent revisions or pseudo-options passed on the command line. - Influence pseudo-options passed via standard input. - _Not_ influence normal revisions passed via standard input. This behaviour is extremely inconsistent and bound to cause confusion. While it would be nice to retroactively change the behaviour for how `--not` and `--stdin` behave together, chances are quite high that this would break existing scripts that expect the current behaviour that has been around for many years by now. This is thus not really a viable option to explore to fix the inconsistency. Instead, we change the behaviour of how pseudo-opts read via standard input influence the flags such that the effect is fully localized. With this change, when reading `--not` via standard input, it will: - _Not_ influence subsequent revisions or pseudo-options passed on the command line, which is a change in behaviour. - Influence pseudo-options passed via standard input. - Influence normal revisions passed via standard input, which is a change in behaviour. Thus, all flags read via standard input are fully self-contained to that standard input, only. While this is a breaking change as well, the behaviour has only been recently introduced with Git v2.42.0. Furthermore, the current behaviour can be regarded as a simple bug. With that in mind it feels like the right thing to retroactively change it and make the behaviour sane. Signed-off-by: Patrick Steinhardt <ps@pks.im> Reported-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-25 09:59:04 -07:00
Junio C Hamano	fb6e6e06d5	Merge branch 'jk/ort-unused-parameter-cleanups' Code clean-up. * jk/ort-unused-parameter-cleanups: merge-ort: lowercase a few error messages merge-ort: drop unused "opt" parameter from merge_check_renames_reusable() merge-ort: drop unused parameters from detect_and_process_renames() merge-ort: stop passing "opt" to read_oid_strbuf() merge-ort: drop custom err() function	2023-09-22 17:01:36 -07:00
Jeff King	252d693797	test-lib: set UBSAN_OPTIONS to match ASan For a long time we have used ASAN_OPTIONS to set abort_on_error. This is important because we want to notice detected problems even in programs which are expected to fail. But we never did the same for UBSAN_OPTIONS. This means that our UBSan test suite runs might silently miss some cases. It also causes a more visible effect, which is that t4058 complains about unexpected "fixes" (and this is how I noticed the issue): $ make SANITIZE=undefined CC=gcc && (cd t && ./t4058-*) ... ok 8 - git read-tree does not segfault # TODO known breakage vanished ok 9 - reset --hard does not segfault # TODO known breakage vanished ok 10 - git diff HEAD does not segfault # TODO known breakage vanished The tests themselves aren't that interesting. We have a known bug where these programs segfault, and they do when compiled without sanitizers. With UBSan, when the test runs: test_might_fail git read-tree --reset base it gets: cache-tree.c:935:9: runtime error: member access within misaligned address 0x5a5a5a5a5a5a5a5a for type 'struct cache_entry', which requires 8 byte alignment So that's garbage memory which would _usually_ cause us to segfault, but UBSan catches it and complains first about the alignment. That makes sense, but the weird thing is that UBSan then exits instead of aborting, so our test_might_fail call considers that an acceptable outcome and the test "passes". Curiously, this historically seems to have aborted, because I've run "make test" with UBSan many times (and so did our CI) and we never saw the problem. Even more curiously, I see an abort if I use clang with ASan and UBSan together, like: # this aborts! make SANITIZE=undefined,address CC=clang But not with just UBSan, and not with both when used with gcc: # none of these do make SANITIZE=undefined CC=gcc make SANITIZE=undefined CC=clang make SANITIZE=undefined,address CC=gcc Likewise moving to older versions of gcc (I tried gcc-11 and gcc-12 on my Debian system) doesn't abort. Nor does moving around in Git's history. Neither this test nor the relevant code have been touched in a while, and going back to v2.41.0 produces the same outcome (even though many UBSan CI runs have passed in the meantime). So _something_ changed on my system (and likely will soon on other people's, since this is stock Debian unstable), but I didn't track it further. I don't know why it ever aborted in the past, but we definitely should be explicit here and tell UBSan what we want to happen. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-21 14:10:36 -07:00
Junio C Hamano	8d73a2cc03	completion: loosen and document the requirement around completing alias Recently we started to tell users to spell ": git foo ;" with space(s) around 'foo' for an alias to be completed similarly to the 'git foo' command. It however is easy to also allow users to spell it in a more natural way with the semicolon attached to 'foo', i.e. ": git foo;". Also, add a comment to note that 'git' is optional and writing ": foo;" would complete the alias just fine. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-20 11:41:41 -07:00
Junio C Hamano	3c2af826a3	Merge branch 'jc/update-index-show-index-version' "git update-index" learns "--show-index-version" to inspect the index format version used by the on-disk index file. * jc/update-index-show-index-version: test-tool: retire "index-version" update-index: add --show-index-version update-index doc: v4 is OK with JGit and libgit2	2023-09-20 10:45:16 -07:00
Junio C Hamano	767e4d68c7	Merge branch 'ob/t3404-typofix' Code clean-up. * ob/t3404-typofix: t3404-rebase-interactive.sh: fix typos in title of a rewording test	2023-09-20 10:44:58 -07:00
Junio C Hamano	671eaaac0c	Merge branch 'js/diff-cached-fsmonitor-fix' "git diff --cached" codepath did not fill the necessary stat information for a file when fsmonitor knows it is clean and ended up behaving as if it is not clean, which has been corrected. * js/diff-cached-fsmonitor-fix: diff-lib: fix check_removed when fsmonitor is on	2023-09-20 10:44:57 -07:00
Junio C Hamano	7435d51bfd	Merge branch 'pw/diff-no-index-from-named-pipes' "git diff --no-index -R <(one) <(two)" did not work correctly, which has been corrected. * pw/diff-no-index-from-named-pipes: diff --no-index: fix -R with stdin	2023-09-20 10:44:57 -07:00

1 2 3 4 5 ...

21553 commits