development/git - HydraGit

mirror of https://github.com/git/git synced 2024-07-17 02:57:48 +00:00

Author	SHA1	Message	Date
Junio C Hamano	922cc26e41	Merge branch 'ps/do-not-trust-commit-graph-blindly-for-existence' into kn/rev-list-missing-fix * ps/do-not-trust-commit-graph-blindly-for-existence: commit: detect commits that exist in commit-graph but not in the ODB commit-graph: introduce envvar to disable commit existence checks	2023-11-01 12:06:55 +09:00
Patrick Steinhardt	7a5d604443	commit: detect commits that exist in commit-graph but not in the ODB Commit graphs can become stale and contain references to commits that do not exist in the object database anymore. Theoretically, this can lead to a scenario where we are able to successfully look up any such commit via the commit graph even though such a lookup would fail if done via the object database directly. As the commit graph is mostly intended as a sort of cache to speed up parsing of commits we do not want to have diverging behaviour in a repository with and a repository without commit graphs, no matter whether they are stale or not. As commits are otherwise immutable, the only thing that we really need to care about is thus the presence or absence of a commit. To address potentially stale commit data that may exist in the graph, our `lookup_commit_in_graph()` function will check for the commit's existence in both the commit graph, but also in the object database. So even if we were able to look up the commit's data in the graph, we would still pretend as if the commit didn't exist if it is missing in the object database. We don't have the same safety net in `parse_commit_in_graph_one()` though. This function is mostly used internally in "commit-graph.c" itself to validate the commit graph, and this usage is fine. We do expose its functionality via `parse_commit_in_graph()` though, which gets called by `repo_parse_commit_internal()`, and that function is in turn used in many places in our codebase. For all I can see this function is never used to directly turn an object ID into a commit object without additional safety checks before or after this lookup. What it is being used for though is to walk history via the parent chain of commits. So when commits in the parent chain of a graph walk are missing it is possible that we wouldn't notice if that missing commit was part of the commit graph. Thus, a query like `git rev-parse HEAD~2` can succeed even if the intermittent commit is missing. It's unclear whether there are additional ways in which such stale commit graphs can lead to problems. In any case, it feels like this is a bigger bug waiting to happen when we gain additional direct or indirect callers of `repo_parse_commit_internal()`. So let's fix the inconsistent behaviour by checking for object existence via the object database, as well. This check of course comes with a performance penalty. The following benchmarks have been executed in a clone of linux.git with stable tags added: Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] Range (min … max): 2.894 s … 2.950 s 10 runs Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] Range (min … max): 3.780 s … 3.961 s 10 runs Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] Range (min … max): 13.714 s … 13.995 s 10 runs Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] Range (min … max): 13.645 s … 14.038 s 10 runs Summary git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) We look at a ~30% regression in general, but in general we're still a whole lot faster than without the commit graph. To counteract this, the new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-01 12:04:06 +09:00
Patrick Steinhardt	e04838ea82	commit-graph: introduce envvar to disable commit existence checks Our `lookup_commit_in_graph()` helper tries to look up commits from the commit graph and, if it doesn't exist there, falls back to parsing it from the object database instead. This is intended to speed up the lookup of any such commit that exists in the database. There is an edge case though where the commit exists in the graph, but not in the object database. To avoid returning such stale commits the helper function thus double checks that any such commit parsed from the graph also exists in the object database. This makes the function safe to use even when commit graphs aren't updated regularly. We're about to introduce the same pattern into other parts of our code base though, namely `repo_parse_commit_internal()`. Here the extra sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was a newly introduced helper, and as such there was no performance hit by adding this sanity check. If we added `repo_parse_commit_internal()` with that sanity check right from the beginning as well, this would probably never have been an issue to begin with. But by retrofitting it with this sanity check now we do add a performance regression to preexisting code, and thus there is a desire to avoid this or at least give an escape hatch. In practice, there is no inherent reason why either of those functions should have the sanity check whereas the other one does not: either both of them are able to detect this issue or none of them should be. This also means that the default of whether we do the check should likely be the same for both. To err on the side of caution, we thus rather want to make `repo_parse_commit_internal()` stricter than to loosen the checks that we already have in `lookup_commit_in_graph()`. The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is the default, we will double check that commits looked up in the commit graph via `lookup_commit_in_graph()` also exist in the object database. This same check will also be added in `repo_parse_commit_internal()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-01 12:04:06 +09:00
Junio C Hamano	2e8e77cbac	The twenty-first batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-23 13:56:38 -07:00
Junio C Hamano	d12166d3c8	Merge branch 'en/docfixes' Documentation typo and grammo fixes. * en/docfixes: (25 commits) documentation: add missing parenthesis documentation: add missing quotes documentation: add missing fullstops documentation: add some commas where they are helpful documentation: fix whitespace issues documentation: fix capitalization documentation: fix punctuation documentation: use clearer prepositions documentation: add missing hyphens documentation: remove unnecessary hyphens documentation: add missing article documentation: fix choice of article documentation: whitespace is already generally plural documentation: fix singular vs. plural documentation: fix verb vs. noun documentation: fix adjective vs. noun documentation: fix verb tense documentation: employ consistent verb tense for a list documentation: fix subject/verb agreement documentation: remove extraneous words ...	2023-10-23 13:56:37 -07:00
Junio C Hamano	5edbcead42	Merge branch 'bc/racy-4gb-files' The index file has room only for lower 32-bit of the file size in the cached stat information, which means cached stat information will have 0 in its sd_size member for a file whose size is multiple of 4GiB. This is mistaken for a racily clean path. Avoid it by storing a bogus sd_size value instead for such files. * bc/racy-4gb-files: Prevent git from rehashing 4GiB files t: add a test helper to truncate files	2023-10-23 13:56:37 -07:00
Junio C Hamano	626f689f79	Merge branch 'jc/fail-stash-to-store-non-stash' Feeding "git stash store" with a random commit that was not created by "git stash create" now errors out. * jc/fail-stash-to-store-non-stash: stash: be careful what we store	2023-10-23 13:56:37 -07:00
Junio C Hamano	755fb09163	Merge branch 'so/diff-merges-dd' "git log" and friends learned "--dd" that is a short-hand for "--diff-merges=first-parent -p". * so/diff-merges-dd: completion: complete '--dd' diff-merges: introduce '--dd' option diff-merges: improve --diff-merges documentation	2023-10-23 13:56:37 -07:00
Junio C Hamano	f32af12cee	Merge branch 'jk/chunk-bounds' The codepaths that read "chunk" formatted files have been corrected to pay attention to the chunk size and notice broken files. * jk/chunk-bounds: (21 commits) t5319: make corrupted large-offset test more robust chunk-format: drop pair_chunk_unsafe() commit-graph: detect out-of-order BIDX offsets commit-graph: check bounds when accessing BIDX chunk commit-graph: check bounds when accessing BDAT chunk commit-graph: bounds-check generation overflow chunk commit-graph: check size of generations chunk commit-graph: bounds-check base graphs chunk commit-graph: detect out-of-bounds extra-edges pointers commit-graph: check size of commit data chunk midx: check size of revindex chunk midx: bounds-check large offset chunk midx: check size of object offset chunk midx: enforce chunk alignment on reading midx: check size of pack names chunk commit-graph: check consistency of fanout table midx: check size of oid lookup chunk commit-graph: check size of oid fanout chunk midx: stop ignoring malformed oid fanout chunk t: add library for munging chunk-format files ...	2023-10-23 13:56:36 -07:00
Junio C Hamano	ceadf0f3cf	The twentieth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-20 16:23:11 -07:00
Junio C Hamano	4835409be1	Merge branch 'ps/rewritten-is-per-worktree-doc' Doc update. * ps/rewritten-is-per-worktree-doc: doc/git-worktree: mention "refs/rewritten" as per-worktree refs	2023-10-20 16:23:11 -07:00
Junio C Hamano	92741d83c0	Merge branch 'ak/pretty-decorate-more-fix' Unlike "git log --pretty=%D", "git log --pretty="%(decorate)" did not auto-initialize the decoration subsystem, which has been corrected. * ak/pretty-decorate-more-fix: pretty: fix ref filtering for %(decorate) formats	2023-10-20 16:23:11 -07:00
Junio C Hamano	6b1e2254d6	Merge branch 'vd/loose-ref-iteration-optimization' The code to iterate over loose references have been optimized to reduce the number of lstat() system calls. * vd/loose-ref-iteration-optimization: files-backend.c: avoid stat in 'loose_fill_ref_dir' dir.[ch]: add 'follow_symlink' arg to 'get_dtype' dir.[ch]: expose 'get_dtype' ref-cache.c: fix prefix matching in ref iteration	2023-10-20 16:23:11 -07:00
Junio C Hamano	c662038629	Merge branch 'ty/merge-tree-strategy-options' "git merge-tree" learned to take strategy backend specific options via the "-X" option, like "git merge" does. * ty/merge-tree-strategy-options: merge: introduce {copy\|clear}_merge_options() merge-tree: add -X strategy option	2023-10-20 16:23:11 -07:00
Junio C Hamano	813d9a9188	The nineteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-18 13:25:42 -07:00
Junio C Hamano	7906b5c957	Merge branch 'jc/merge-ort-attr-index-fix' Fix "git merge-tree" to stop segfaulting when the --attr-source option is used. * jc/merge-ort-attr-index-fix: merge-ort: initialize repo in index state	2023-10-18 13:25:42 -07:00
Junio C Hamano	cc7d7183f0	Merge branch 'sn/cat-file-doc-update' "git cat-file" documentation updates. * sn/cat-file-doc-update: doc/cat-file: make synopsis and description less confusing	2023-10-18 13:25:41 -07:00
Junio C Hamano	0bc6bff9d5	Merge branch 'xz/commit-title-soft-limit-doc' Doc update. * xz/commit-title-soft-limit-doc: doc: correct the 50 characters soft limit (+)	2023-10-18 13:25:41 -07:00
Junio C Hamano	79861babe2	Merge branch 'tb/repack-max-cruft-size' "git repack" learned "--max-cruft-size" to prevent cruft packs from growing without bounds. * tb/repack-max-cruft-size: repack: free existing_cruft array after use builtin/repack.c: avoid making cruft packs preferred builtin/repack.c: implement support for `--max-cruft-size` builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE t7700: split cruft-related tests to t7704	2023-10-18 13:25:41 -07:00
Jeff King	7538f9d89b	t5319: make corrupted large-offset test more robust The test t5319.88 ("reader bounds-checks large offset table") can fail intermittently. The failure mode looks like this: 1. An earlier test sets up "objects64", a directory that can be used to produce a midx with a corrupted large-offsets table. To get the large offsets, it corrupts the normal ".idx" file to have a fake large offset, and then builds a midx from that. That midx now has a large offset table, which is what we want. But we also have a .idx on disk that has a corrupted entry. We'll call the object with the corrupted large-offset "X". 2. In t5319.88, we further corrupt the midx by reducing the size of the large-offset chunk (because our goal is to make sure we do not do an out-of-bounds read on it). 3. We then enumerate all of the objects with "cat-file --batch-check --batch-all-objects", expecting to see a complaint when we try to show object X. We use --batch-all-objects because our objects64 repo doesn't actually have any refs (but if we check them all, one of them will be the failing one). The default batch-check format includes %(objecttype) and %(objectsize), both of which require us to access the actual pack data (and thus requires looking at the offset). 4a. Usually, this succeeds. We try to output object X, do a lookup via the midx for the type/size lookup, and run into the corrupt large-offset table. 4b. But sometimes we hit a different error. If another object points to X as a delta base, then trying to find the type of that object requires walking the delta chain to the base entry (since only the base has the concrete type; deltas themselves are either OFS_DELTA or REF_DELTA). Normally this would not require separate offset lookups at all, as deltas are usually stored as OFS_DELTA, specifying the relative offset to the base. But the corrupt idx created in step 1 is done directly with "git pack-objects" and does not pass the --delta-base-offset option, meaning we have REF_DELTA entries! Those do have to consult an index to find the location of the base object, and they use the pack .idx to do this. The same pack .idx that we know is corrupted from step 1! Git does notice the error, but it does so by seeing the corrupt .idx file, not the corrupt midx file, and the error it reports is different, causing the test to fail. The set of objects created in the test is deterministic. But the delta selection seems not to be (which is not too surprising, as it is multi-threaded). I have seen the failure in Windows CI but haven't reproduced it locally (not even with --stress). Re-running a failed Windows CI job tends to work. But when I download and examine the trash directory from a failed run, it shows a different set of deltas than I get locally. But the exact source of non-determinism isn't that important; our test should be robust against any order. There are a few options to fix this: a. It would be OK for the "objects64" setup to "unbreak" the .idx file after generating the midx. But then it would be hard for subsequent tests to reuse it, since it is the corrupted idx that forces the midx to have a large offset table. b. The "objects64" setup could use --delta-base-offset. This would fix our problem, but earlier tests have many hard-coded offsets. Using OFS_DELTA would change the locations of objects in the pack (this might even be OK because I think most of the offsets are within the .idx file, but it seems brittle and I'm afraid to touch it). c. Our cat-file output is in oid order by default. Since we store bases before deltas, if we went in pack order (using the "--unordered" flag), we'd always see our corrupt X before any delta which depends on it. But using "--unordered" means we skip the midx entirely. That makes sense, since it is just enumerating all of the packs, using the offsets found in their .idx files directly. So it doesn't work for our test. d. We could ask directly about object X, rather than enumerating all of them. But that requires further hard-coding of the oid (both sha1 and sha256) of object X. I'd prefer not to introduce more brittleness. e. We can use a --batch-check format that looks at the pack data, but doesn't have to chase deltas. The problem in this case is %(objecttype), which has to walk to the base. But %(objectsize) does not; we can get the value directly from the delta itself. Another option would be %(deltabase), where we report the REF_DELTA name but don't look at its data. I've gone with option (e) here. It's kind of subtle, but it's simple and has no side effects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-14 10:17:25 -07:00
Junio C Hamano	a9ecda2788	The eighteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-13 14:18:29 -07:00
Junio C Hamano	2920971a7f	Merge branch 'jk/decoration-and-other-leak-fixes' Leakfix. * jk/decoration-and-other-leak-fixes: daemon: free listen_addr before returning revision: clear decoration structs during release_revisions() decorate: add clear_decoration() function	2023-10-13 14:18:28 -07:00
Junio C Hamano	09dcbb486d	Merge branch 'ar/diff-index-merge-base-fix' "git diff --merge-base X other args..." insisted that X must be a commit and errored out when given an annotated tag that peels to a commit, but we only need it to be a committish. This has been corrected. * ar/diff-index-merge-base-fix: diff: fix --merge-base with annotated tags	2023-10-13 14:18:28 -07:00
Junio C Hamano	b32f5b6b34	Merge branch 'js/submodule-fix-misuse-of-path-and-name' In .gitmodules files, submodules are keyed by their names, and the path to the submodule whose name is $name is specified by the submodule.$name.path variable. There were a few codepaths that mixed the name and path up when consulting the submodule database, which have been corrected. It took long for these bugs to be found as the name of a submodule initially is the same as its path, and the problem does not surface until it is moved to a different path, which apparently happens very rarely. * js/submodule-fix-misuse-of-path-and-name: t7420: test that we correctly handle renamed submodules t7419: test that we correctly handle renamed submodules t7419, t7420: use test_cmp_config instead of grepping .gitmodules t7419: actually test the branch switching submodule--helper: return error from set-url when modifying failed submodule--helper: use submodule_from_path in set-{url,branch}	2023-10-13 14:18:28 -07:00
Junio C Hamano	a45eddec40	Merge branch 'jk/commit-graph-leak-fixes' Leakfix. * jk/commit-graph-leak-fixes: commit-graph: clear oidset after finishing write commit-graph: free write-context base_graph_name during cleanup commit-graph: free write-context entries before overwriting commit-graph: free graph struct that was not added to chain commit-graph: delay base_graph assignment in add_graph_to_chain() commit-graph: free all elements of graph chain commit-graph: move slab-clearing to close_commit_graph() merge: free result of repo_get_merge_bases() commit-reach: free temporary list in get_octopus_merge_bases() t6700: mark test as leak-free	2023-10-13 14:18:28 -07:00
Junio C Hamano	c75e91499b	Merge branch 'la/trailer-test-and-doc-updates' Test coverage for trailers has been improved. * la/trailer-test-and-doc-updates: trailer doc: <token> is a <key> or <keyAlias>, not both trailer doc: separator within key suppresses default separator trailer doc: emphasize the effect of configuration variables trailer --unfold help: prefer "reformat" over "join" trailer --parse docs: add explanation for its usefulness trailer --only-input: prefer "configuration variables" over "rules" trailer --parse help: expose aliased options trailer --no-divider help: describe usual "---" meaning trailer: trailer location is a place, not an action trailer doc: narrow down scope of --where and related flags trailer: add tests to check defaulting behavior with --no-* flags trailer test description: this tests --where=after, not --where=before trailer tests: make test cases self-contained	2023-10-13 14:18:27 -07:00
Junio C Hamano	e56b9edf22	Merge branch 'ds/mailmap-entry-update' Update mailmap entry for Derrick. * ds/mailmap-entry-update: mailmap: change primary address for Derrick Stolee	2023-10-13 14:18:27 -07:00
Jason Hatton	5143ac07b1	Prevent git from rehashing 4GiB files The index stores file sizes using a uint32_t. This causes any file that is a multiple of 2^32 to have a cached file size of zero. Zero is a special value used by racily clean. This causes git to rehash every file that is a multiple of 2^32 every time git status or git commit is run. This patch mitigates the problem by making all files that are a multiple of 2^32 appear to have a size of 1<<31 instead of zero. The value of 1<<31 is chosen to keep it as far away from zero as possible to help prevent things getting mixed up with unpatched versions of git. An example would be to have a 2^32 sized file in the index of patched git. Patched git would save the file as 2^31 in the cache. An unpatched git would very much see the file has changed in size and force it to rehash the file, which is safe. The file would have to grow or shrink by exactly 2^31 and retain all of its ctime, mtime, and other attributes for old git to not notice the change. This patch does not change the behavior of any file that is not an exact multiple of 2^32. Signed-off-by: Jason D. Hatton <jhatton@globalfinishing.com> Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-13 13:33:35 -07:00
brian m. carlson	678eb55f5d	t: add a test helper to truncate files In a future commit, we're going to work with some large files which will be at least 4 GiB in size. To take advantage of the sparseness functionality on most Unix systems and avoid running the system out of disk, it would be convenient to use truncate(2) to simply create a sparse file of sufficient size. However, the GNU truncate(1) utility isn't portable, so let's write a tiny test helper that does the work for us. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-13 13:33:35 -07:00
Junio C Hamano	59167d7d09	The seventeenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-12 12:18:27 -07:00
Junio C Hamano	4ae4c70577	Merge branch 'js/ci-coverity' GitHub CI workflow has learned to trigger Coverity check. * js/ci-coverity: coverity: detect and report when the token or project is incorrect coverity: allow running on macOS coverity: support building on Windows coverity: allow overriding the Coverity project coverity: cache the Coverity Build Tool ci: add a GitHub workflow to submit Coverity scans	2023-10-12 12:18:27 -07:00
Junio C Hamano	c70e7a3cfd	Merge branch 'jm/git-status-submodule-states-docfix' Docfix. * jm/git-status-submodule-states-docfix: git-status.txt: fix minor asciidoc format issue	2023-10-12 12:18:26 -07:00
Junio C Hamano	6e47cfcffc	Merge branch 'rs/parse-opt-ctx-cleanup' Code clean-up. * rs/parse-opt-ctx-cleanup: parse-options: drop unused parse_opt_ctx_t member	2023-10-12 12:18:26 -07:00
Derrick Stolee	6e5457d8c7	mailmap: change primary address for Derrick Stolee The previous primary address is no longer valid. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-12 10:59:36 -07:00
Junio C Hamano	d9b6634589	stash: be careful what we store "git stash store" is meant to store what "git stash create" produces, as these two are implementation details of the end-user facing "git stash save" command. Even though it is clearly documented as such, users would try silly things like "git stash store HEAD" to render their stash unusable. Worse yet, because "git stash drop" does not allow such a stash entry to be removed, "git stash clear" would be the only way to recover from such a mishap. Reuse the logic that allows "drop" to refrain from working on such a stash entry to teach "store" to avoid storing an object that is not a stash entry in the first place. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-11 16:27:30 -07:00
Junio C Hamano	b182658e3e	merge: introduce {copy\|clear}_merge_options() When mostly the same set of options are to be used to perform multiple merges, one instance of the merge_options structure may want to be created and used by copying from the same template instance. We saw such a use recently in "git merge-tree". Let's make the pattern official by introducing copy_merge_options() as a supported way to make a copy of the structure, and also give clear_merge_options() to release any resources held by a copied instance. Currently we only make a shallow copy, so the former is a mere structure assignment while the latter is a no-op, but this may change in the future as the members of merge_options structure evolve. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-11 13:37:47 -07:00
Junio C Hamano	aab89be2eb	The sixteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-10 11:39:15 -07:00
Junio C Hamano	1fdedb7c7d	Merge branch 'cc/repack-sift-filtered-objects-to-separate-pack' "git repack" machinery learns to pay attention to the "--filter=" option. * cc/repack-sift-filtered-objects-to-separate-pack: gc: add `gc.repackFilterTo` config option repack: implement `--filter-to` for storing filtered out objects gc: add `gc.repackFilter` config option repack: add `--filter=<filter-spec>` option pack-bitmap-write: rebuild using new bitmap when remapping repack: refactor finding pack prefix repack: refactor finishing pack-objects command t/helper: add 'find-pack' test-tool pack-objects: allow `--filter` without `--stdout`	2023-10-10 11:39:15 -07:00
Junio C Hamano	afb0d0880a	Merge branch 'ds/init-diffstat-width' Code clean-up. * ds/init-diffstat-width: diff --stat: set the width defaults in a helper function	2023-10-10 11:39:14 -07:00
Junio C Hamano	a7a2d10421	Merge branch 'cw/prelim-cleanup' Shuffle some bits across headers and sources to prepare for libification effort. * cw/prelim-cleanup: parse: separate out parsing functions from config.h config: correct bad boolean env value error message wrapper: reduce scope of remove_or_warn() hex-ll: separate out non-hash-algo functions	2023-10-10 11:39:14 -07:00
Junio C Hamano	3df51ea0a5	Merge branch 'eb/limit-bulk-checkin-to-blobs' The "streaming" interface used for bulk-checkin codepath has been narrowed to take only blob objects for now, with no real loss of functionality. * eb/limit-bulk-checkin-to-blobs: bulk-checkin: only support blobs in index_bulk_checkin	2023-10-10 11:39:14 -07:00
Patrick Steinhardt	8b3aa36f5a	doc/git-worktree: mention "refs/rewritten" as per-worktree refs Some references are special in the context of worktrees as they are considered to be per-worktree instead of shared across all of the worktrees. Most importantly, this includes "refs/worktree/" that have explicitly been designed such that users can create per-woorktree refs. But there are also special references that have an associated meaning like "refs/bisect/", which is used to track state of git-bisect(1). These special per-worktree references are documented in git-worktree(1), but one instance is missing. In `a9be29c981` (sequencer: make refs generated by the `label` command worktree-local, 2018-04-25), we have converted "refs/rewritten/" to be a per-worktree reference as well. These references are used by our sequencer infrastructure to generate labels for rebased commits. So in order to allow for multiple concurrent rebases to happen in different worktrees, these references need to be tracked per worktree. We forgot to update our documentation to mention these new per-worktree references, which is fixed by this patch. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-10 09:23:16 -07:00
Jeff King	ca06f0fe5d	chunk-format: drop pair_chunk_unsafe() There are no callers left, and we don't want anybody to add new ones (they should use the not-unsafe version instead). So let's drop the function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:02 -07:00
Jeff King	12192a9db9	commit-graph: detect out-of-order BIDX offsets The BIDX chunk tells us the offsets at which each commit's Bloom filters can be found in the BDAT chunk. We compute the length of each filter by checking the offsets of neighbors and subtracting them. If the offsets are out of order, then we'll get a negative length, which we then store as a very large unsigned value. This can cause us to read out-of-bounds memory, as we access the hash data modulo "filter->len * BITS_PER_WORD". We can easily detect this case when loading the individual filters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:02 -07:00
Jeff King	581e0f8b18	commit-graph: check bounds when accessing BIDX chunk We load the bloom_filter_indexes chunk using pair_chunk(), so we have no idea how big it is. This can lead to out-of-bounds reads if it is smaller than expected, since we index it based on the number of commits found elsewhere in the graph file. We can check the chunk size up front, like we do for CDAT and other chunks with one fixed-size record per commit. The test case demonstrates the problem. It actually won't segfault, because we end up reading random data from the follow-on chunk (BDAT in this case), and the bounds checks added in the previous patch complain. But this is by no means assured, and you can craft a commit-graph file with BIDX at the end (or a smaller BDAT) that does segfault. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	920f400e91	commit-graph: check bounds when accessing BDAT chunk When loading Bloom filters from a commit-graph file, we use the offset values in the BIDX chunk to index into the memory mapped for the BDAT chunk. But since we don't record how big the BDAT chunk is, we just trust that the BIDX offsets won't cause us to read outside of the chunk memory. A corrupted or malicious commit-graph file will cause us to segfault (in practice this isn't a very interesting attack, since commit-graph files are local-only, and the worst case is an out-of-bounds read). We can't fix this by checking the chunk size during parsing, since the data in the BDAT chunk doesn't have a fixed size (that's why we need the BIDX in the first place). So we'll fix it in two parts: 1. Record the BDAT chunk size during parsing, and then later check that the BIDX offsets we look up are within bounds. 2. Because the offsets are relative to the end of the BDAT header, we must also make sure that the BDAT chunk is at least as large as the expected header size. Otherwise, we overflow when trying to move past the header, even for an offset of "0". We can check this early, during the parsing stage. The error messages are rather verbose, but since this is not something you'd expect to see outside of severe bugs or corruption, it makes sense to err on the side of too many details. Sadly we can't mention the filename during the chunk-parsing stage, as we haven't set g->filename at this point, nor passed it down through the stack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	ee6a792412	commit-graph: bounds-check generation overflow chunk If the generation entry in a commit-graph doesn't fit, we instead insert an offset into a generation overflow chunk. But since we don't record the size of the chunk, we may read outside the chunk if the offset we find on disk is malicious or corrupted. We can't check the size of the chunk up-front; it will vary based on how many entries need overflow. So instead, we'll do a bounds-check before accessing the chunk memory. Unfortunately there is no error-return from this function, so we'll just have to die(), which is what it does for other forms of corruption. As with other cases, we can drop the st_mult() call, since we know our bounds-checked value will fit within a size_t. Before this patch, the test here actually "works" because we read garbage data from the next chunk. And since that garbage data happens not to provide a generation number which changes the output, it appears to work. We could construct a case that actually segfaults or produces wrong output, but it would be a bit tricky. For our purposes its sufficient to check that we've detected the bounds error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	4a3c34662b	commit-graph: check size of generations chunk We neither check nor record the size of the generations chunk we parse from a commit-graph file. This should have one uint32_t for each commit in the file; if it is smaller (due to corruption, etc), we may read outside the mapped memory. The included test segfaults without this patch, as it shrinks the size considerably (and the chunk is near the end of the file, so we read off the end of the array rather than accidentally reading another chunk). We can fix this by checking the size up front (like we do for other fixed-size chunks, like CDAT). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	6cf61d0db5	commit-graph: bounds-check base graphs chunk When we are loading a commit-graph chain, we check that each slice of the chain points to the appropriate set of base graphs via its BASE chunk. But since we don't record the size of the chunk, we may access out-of-bounds memory if the file is corrupted. Since we know the number of entries we expect to find (based on the position within the commit-graph-chain file), we can just check the size up front. In theory this would also let us drop the st_mult() call a few lines later when we actually access the memory, since we know that the computed offset will fit in a size_t. But because the operands "g->hash_len" and "n" have types "unsigned char" and "int", we'd have to cast to size_t first. Leaving the st_mult() does that cast, and makes it more obvious that we don't have an overflow problem. Note that the test does not actually segfault before this patch, since it just reads garbage from the chunk after BASE (and indeed, it even rejects the file because that garbage does not have the expected hash value). You could construct a file with BASE at the end that did segfault, but corrupting the existing one is easy, and we can check stderr for the expected message. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	9622610e55	commit-graph: detect out-of-bounds extra-edges pointers If an entry in a commit-graph file has more than 2 parents, the fixed-size parent fields instead point to an offset within an "extra edges" chunk. We blindly follow these, assuming that the chunk is present and sufficiently large; this can lead to an out-of-bounds read for a corrupt or malicious file. We can fix this by recording the size of the chunk and adding a bounds-check in fill_commit_in_graph(). There are a few tricky bits: 1. We'll switch from working with a pointer to an offset. This makes some corner cases just fall out naturally: a. If we did not find an EDGE chunk at all, our size will correctly be zero (so everything is "out of bounds"). b. Comparing "size / 4" lets us make sure we have at least 4 bytes to read, and we never compute a pointer more than one element past the end of the array (computing a larger pointer is probably OK in practice, but is technically undefined behavior). c. The current code casts to "uint32_t ". Replacing it with an offset avoids any comparison between different types of pointer (since the chunk is stored as "unsigned char "). 2. This is the first case in which fill_commit_in_graph() may return anything but success. We need to make sure to roll back the "parsed" flag (and any parents we might have added before running out of buffer) so that the caller can cleanly fall back to loading the commit object itself. It's a little non-trivial to do this, and we might benefit from factoring it out. But we can wait on that until we actually see a second case where we return an error. As a bonus, this lets us drop the st_mult() call. Since we've already done a bounds check, we know there won't be any integer overflow (it would imply our buffer is larger than a size_t can hold). The included test does not actually segfault before this patch (though you could construct a case where it does). Instead, it reads garbage from the next chunk which results in it complaining about a bogus parent id. This is sufficient for our needs, though (we care that the fallback succeeds, and that stderr mentions the out-of-bounds read). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00

1 2 3 4 5 ...

71361 commits