development/git - git

mirror of https://github.com/git/git synced 2024-07-05 00:58:49 +00:00

Author	SHA1	Message	Date
Junio C Hamano	f95bafbaed	Merge branch 'ps/commit-graph-write-leakfix' Leakfix. * ps/commit-graph-write-leakfix: commit-graph: fix memory leak when not writing graph	2024-01-26 08:54:45 -08:00
Junio C Hamano	4cc0f8e8fa	Merge branch 'jk/commit-graph-slab-clear-fix' Clearing in-core repository (happens during e.g., "git fetch --recurse-submodules" with commit graph enabled) made in-core commit object in an inconsistent state by discarding the necessary data from commit-graph too early, which has been corrected. * jk/commit-graph-slab-clear-fix: commit-graph: retain commit slab when closing NULL commit_graph	2024-01-16 10:11:58 -08:00
Patrick Steinhardt	4efa9308ea	commit-graph: fix memory leak when not writing graph When `write_commit_graph()` bails out writing a split commit-graph early then it may happen that we have already gathered the set of existing commit-graph file names without yet determining the new merged set of files. This can result in a memory leak though because we only clear the preimage of files when we have collected the postimage. Fix this issue by dropping the condition altogether so that we always try to free both preimage and postimage filenames. As the context structure is zero-initialized this simplification is safe to do. Signed-off-by: Patrick Steinhardt <ps@pks.im> Acked-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-15 17:08:28 -08:00
Junio C Hamano	492ee03f60	Merge branch 'en/header-cleanup' Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files	2024-01-08 14:05:15 -08:00
Jeff King	d70f554cdf	commit-graph: retain commit slab when closing NULL commit_graph This fixes a regression introduced in `ac6d45d11f` (commit-graph: move slab-clearing to close_commit_graph(), 2023-10-03), in which running: git -c fetch.writeCommitGraph=true fetch --recurse-submodules multiple times in a freshly cloned repository causes a segfault. What happens in the second (and subsequent) runs is this: 1. We make a "struct commit" for any ref tips which we're storing (even if we already have them, they still go into FETCH_HEAD). Because the first run will have created a commit graph, we'll find those commits in the graph. The commit struct is therefore created with a NULL "maybe_tree" entry, because we can load its oid from the graph later. But to do that we need to remember that we got the commit from the graph, which is recorded in a global commit_graph_data_slab object. 2. Because we're using --recurse-submodules, we'll try to fetch each of the possible submodules. That implies creating a separate "struct repository" in-process for each submodule, which will require a later call to repo_clear(). The call to repo_clear() calls raw_object_store_clear(), which in turn calls close_object_store(), which in turn calls close_commit_graph(). And the latter frees the commit graph data slab. 3. Later, when trying to write out a new commit graph, we'll ask for their tree oid via get_commit_tree_oid(), which will see that the object is parsed but with a NULL maybe_tree field. We'd then usually pull it from the graph file, but because the slab was cleared, we don't realize that we can do so! We end up returning NULL and segfaulting. (It seems questionable that we'd write a graph entry for such a commit anyway, since we know we already have one. I didn't double-check, but that may simply be another side effect of having cleared the slab). The bug is in step (2) above. We should not be clearing the slab when cleaning up the submodule repository structs. Prior to `ac6d45d11f`, we did not do so because it was done inside a helper function that returned early when it saw NULL. So the behavior change from that commit is that we'll now _always_ clear the slab via repo_clear(), even if the repository being closed did not have a commit graph (and thus would have a NULL commit_graph struct). The most immediate fix is to add in a NULL check in close_commit_graph(), making it a true noop when passed in an object_store with a NULL commit_graph (it's OK to just return early, since the rest of its code is already a noop when passed NULL). That restores the pre-ac6d45d11f behavior. And that's what this patch does, along with a test that exercises it (we already have a test that uses submodules along with fetch.writeCommitGraph, but the bug only triggers when there is a subsequent fetch and when that fetch uses --recurse-submodules). So that fixes the regression in the least-risky way possible. I do think there's some fragility here that we might want to follow up on. We have a global commit_graph_data_slab that contains graph positions, and our global commit structs depend on the that slab remaining valid. But close_commit_graph() is just about closing _one_ object store's graph. So it's dangerous to call that function and clear the slab without also throwing away any "struct commit" we might have parsed that depends on it. Which at first glance seems like a bug we could already trigger. In the situation described here, there is no commit graph in the submodule repository, so our commit graph is NULL (in fact, in our test script there is no submodule repo at all, so we immediately return from repo_init() and call repo_clear() only to free up memory). But what would happen if there was one? Wouldn't we see a non-NULL commit_graph entry, and then clear the global slab anyway? The answer is "no", but for very bizarre reasons. Remember that repo_clear() calls raw_object_store_clear(), which then calls close_object_store() and thus close_commit_graph(). But before it does so, raw_object_store_clear() does something else: it frees the commit graph and sets it to NULL! So by this code path we'll _never_ see a non-NULL commit_graph struct, and thus never clear the slab. So it happens to work out. But it still seems questionable to me that we would clear a global slab (which might still be in use) when closing the commit graph. This clearing comes from `957ba814bf` (commit-graph: when closing the graph, also release the slab, 2021-09-08), and was fixing a case where we really did need it to be closed (and in that case we presumably call close_object_store() more directly). So I suspect there may still be a bug waiting to happen there, as any object loaded before the call to close_object_store() may be stranded with a bogus maybe_tree entry (and thus looking at it after the call might cause an error). But I'm not sure how to trigger it, nor what the fix should look like (you probably would need to "unparse" any objects pulled from the graph). And so this patch punts on that for now in favor of fixing the recent regression in the most direct way, which should not have any other fallouts. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-05 08:35:26 -08:00
Elijah Newren	d57c671a51	treewide: remove unnecessary includes in source files Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-26 12:04:33 -08:00
Elijah Newren	ec2101abf3	treewide: add direct includes currently only pulled in transitively The next commit will remove a bunch of unnecessary includes, but to do so, we need some of the lower level direct includes that files rely on to be explicitly specified. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-26 12:04:32 -08:00
Elijah Newren	eea0e59ffb	treewide: remove unnecessary includes in source files Each of these were checked with gcc -E -I. ${SOURCE_FILE} \| grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-26 12:04:31 -08:00
Junio C Hamano	66685e8555	Merge branch 'ps/commit-graph-less-paranoid' Earlier we stopped relying on commit-graph that (still) records information about commits that are lost from the object store, which has negative performance implications. The default has been flipped to disable this pessimization. * ps/commit-graph-less-paranoid: commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default	2023-12-18 14:10:11 -08:00
Junio C Hamano	34401b7ddb	Merge branch 'jk/chunk-bounds-more' Code clean-up for jk/chunk-bounds topic. * jk/chunk-bounds-more: commit-graph: mark chunk error messages for translation commit-graph: drop verify_commit_graph_lite() commit-graph: check order while reading fanout chunk commit-graph: use fanout value for graph size commit-graph: abort as soon as we see a bogus chunk commit-graph: clarify missing-chunk error messages commit-graph: drop redundant call to "lite" verification midx: check consistency of fanout table commit-graph: handle overflow in chunk_size checks	2023-12-09 16:37:48 -08:00
Patrick Steinhardt	b1df3b3867	commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default In `7a5d604443` (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via `f559d6d45e` (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas `7a5d604443` initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-26 10:10:00 +09:00
Jeff King	e020391673	commit-graph: mark chunk error messages for translation The patches from `f32af12cee` (Merge branch 'jk/chunk-bounds', 2023-10-23) added many new untranslated error messages. While it's unlikely for most users to see these messages at all, most of the other commit-graph error messages are translated (and likewise for the matching midx messages). Let's mark them all for consistency (and to help any poor unfortunate user who does manage to find a broken graph file). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-09 19:07:54 +09:00
Jeff King	f4e4756c54	commit-graph: drop verify_commit_graph_lite() As we've moved all of the checks from this function directly into the chunk-reading code used by the caller (and there is only one caller), we can just drop it entirely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-09 19:07:54 +09:00
Jeff King	06fb135f8e	commit-graph: check order while reading fanout chunk We read the fanout chunk, storing a pointer to it, but only confirm that the entries are monotonic in a final "lite" verification step. Let's move that into the actual OIDF chunk callback, so that we can report problems immediately (for all the reasons given in the previous "commit-graph: abort as soon as we see a bogus chunk" commit). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-09 19:07:53 +09:00
Jeff King	d3b6f6c631	commit-graph: use fanout value for graph size Commit-graph, midx, and pack idx files all have both a lookup table of oids and an oid fanout table. In midx and pack idx files, we take the final entry of the fanout table as the source of truth for the number of entries, and then verify that the size of the lookup table matches that. But for commit-graph files, we do the opposite: we use the size of the lookup table as the source of truth, and then check the final fanout entry against it. As noted in `4169d89645` (commit-graph: check consistency of fanout table, 2023-10-09), either is correct. But there are a few reasons to prefer the fanout table as the source of truth: 1. The fanout entries are 32-bits on disk, and that defines the maximum number of entries we can store. But since the size of the lookup table is only bounded by the filesystem, it can be much larger. And hence computing it as the commit-graph does means that we may truncate the result when storing it in a uint32_t. 2. We read the fanout first, then the lookup table. If we're verifying the chunks as we read them, then we'd want to take the fanout as truth (we have nothing yet to check it against) and then we can check that the lookup table matches what we already know. 3. It is pointlessly inconsistent with the midx and pack idx code. Since the three have to do similar size and bounds checks, it is easier to reason about all three if they use the same approach. So this patch moves the assignment of g->num_commits to the fanout parser, and then we can check the size of the lookup chunk as soon as we try to load it. There's already a test covering this situation, which munges the final fanout entry to 2^32-1. In the current code we complain that it does not agree with the table size. But now that we treat the munged value as the source of truth, we'll complain that the lookup table is the wrong size (again, either is correct). So we'll have to update the message we expect (and likewise for an earlier test which does similar munging). There's a similar test for this situation on the midx side, but rather than making a very-large fanout value, it just truncates the lookup table. We could do that here, too, but the very-large fanout value actually shows an interesting corner case. On a 32-bit system, multiplying to find the expected table size would cause an integer overflow. Using st_mult() would detect that, but cause us to die() rather than falling back to the non-graph code path. Checking the size using division (as we do with existing chunk-size checks) avoids the overflow entirely, and the test demonstrates this when run on a 32-bit system. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-09 19:07:53 +09:00
Jeff King	8bd40ed2ae	commit-graph: abort as soon as we see a bogus chunk The code to read commit-graph files tries to read all of the required chunks, but doesn't abort if we can't find one (or if it's corrupted). It's only at the end of reading the file that we then do some sanity checks for NULL entries. But it's preferable to detect the errors and bail immediately, for a few reasons: 1. It's less error-prone. It's easy in the reader functions to flag an error but still end up setting some struct fields (an error I in fact made while working on this patch series). 2. It's safer. Since verifying some chunks depends on the values of other chunks, we may be depending on not-yet-verified data. I don't know offhand of any case where this can cause problems, but it's one less subtle thing to worry about in the reader code. 3. It prevents the user from seeing nonsense errors. If we're missing an OIDL chunk, then g->num_commits will be zero. And so we may complain that the size of our CDAT chunk (which should have a fixed-size record for each commit) is wrong unless it's also zero. But that's misleading; the problem is the missing OIDL chunk; the CDAT one might be fine! So let's just check the return value from read_chunk(). This is exactly how the midx chunk-reading code does it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-09 19:07:53 +09:00
Jeff King	93d2924729	commit-graph: clarify missing-chunk error messages When a required commit-graph chunk cannot be loaded, we leave its entry in the struct NULL, and then later complain that it is missing. But that's just one reason we might not have loaded it, as we also do some data quality checks. Let's switch these messages to say "missing or corrupted", which is exactly what the midx code says for the same cases. Likewise, we'll use the same phrasing and capitalization as those for consistency. And while we're here, we can mark them for translation (just like the midx ones). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-09 19:07:53 +09:00
Jeff King	92de4c5d56	commit-graph: drop redundant call to "lite" verification The idea of verify_commit_graph_lite() is to have cheap verification checks both for everyday use of the graph files (to avoid out of bounds reads, etc) as well as for doing a full check via "commit-graph verify" (which will also check the hash, etc). But the expensive verification checks operate on a commit_graph struct, which we get by using the normal everyday-reader code! So any problem we'd find by calling it would have been found before we even got to the verify_one_commit_graph() function. Removing it simplifies the code a bit, but also frees us up to move the "lite" verification steps around within that everyday-reader code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-09 19:07:53 +09:00
Jeff King	4bc6d43271	commit-graph: handle overflow in chunk_size checks We check the size of chunks with fixed records by multiplying the width of each record by the number of commits in the file. Like: if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH) If this multiplication overflows, we may not notice a chunk is too small (which could later lead to out-of-bound reads). In the current code this is only possible for the CDAT chunk, but the reasons are quite subtle. We compute g->num_commits by dividing the size of the OIDL chunk by the hash length (since it consists of a bunch of hashes). So we know that any size_t multiplication that uses a value smaller than the hash length cannot overflow. And the CDAT records are the only ones that are larger (the others are just 4-byte records). So it's worth fixing all of these, to make it clear that they're not subject to overflow (without having to reason about seemingly unrelated code). The obvious thing to do is add an st_mult(), like: if (chunk_size != st_mult(g->num_commits, GRAPH_DATA_WIDTH)) And that certainly works, but it has one downside: if we detect an overflow, we'll immediately die(). But the commit graph is an optional file; if we run into other problems loading it, we'll generally return an error and fall back to accessing the full objects. Using st_mult() means a malformed file will abort the whole process. So instead, we can do a division like this: if (chunk_size / GRAPH_DATA_WIDTH != g->num_commits) where there's no possibility of overflow. We do lose a little bit of precision; due to integer division truncation we'd allow up to an extra GRAPH_DATA_WIDTH-1 bytes of data in the chunk. That's OK. Our main goal here is making sure we don't have too _few_ bytes, which would cause an out-of-bounds read (we could actually replace our "!=" with "<", but I think it's worth being a little pedantic, as a large mismatch could be a sign of other problems). I didn't add a test here. We'd need to generate a very large graph file in order to get g->num_commits large enough to cause an overflow. And a later patch in this series will use this same division technique in a way that is much easier to trigger in the tests. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-09 19:07:52 +09:00
Junio C Hamano	42b87f7ee6	Merge branch 'ps/do-not-trust-commit-graph-blindly-for-existence' The codepath to traverse the commit-graph learned to notice that a commit is missing (e.g., corrupt repository lost an object), even though it knows something about the commit (like its parents) from what is in commit-graph. * ps/do-not-trust-commit-graph-blindly-for-existence: commit: detect commits that exist in commit-graph but not in the ODB commit-graph: introduce envvar to disable commit existence checks	2023-11-08 11:03:59 +09:00
Patrick Steinhardt	e04838ea82	commit-graph: introduce envvar to disable commit existence checks Our `lookup_commit_in_graph()` helper tries to look up commits from the commit graph and, if it doesn't exist there, falls back to parsing it from the object database instead. This is intended to speed up the lookup of any such commit that exists in the database. There is an edge case though where the commit exists in the graph, but not in the object database. To avoid returning such stale commits the helper function thus double checks that any such commit parsed from the graph also exists in the object database. This makes the function safe to use even when commit graphs aren't updated regularly. We're about to introduce the same pattern into other parts of our code base though, namely `repo_parse_commit_internal()`. Here the extra sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was a newly introduced helper, and as such there was no performance hit by adding this sanity check. If we added `repo_parse_commit_internal()` with that sanity check right from the beginning as well, this would probably never have been an issue to begin with. But by retrofitting it with this sanity check now we do add a performance regression to preexisting code, and thus there is a desire to avoid this or at least give an escape hatch. In practice, there is no inherent reason why either of those functions should have the sanity check whereas the other one does not: either both of them are able to detect this issue or none of them should be. This also means that the default of whether we do the check should likely be the same for both. To err on the side of caution, we thus rather want to make `repo_parse_commit_internal()` stricter than to loosen the checks that we already have in `lookup_commit_in_graph()`. The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is the default, we will double check that commits looked up in the commit graph via `lookup_commit_in_graph()` also exist in the object database. This same check will also be added in `repo_parse_commit_internal()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-11-01 12:04:06 +09:00
Junio C Hamano	f32af12cee	Merge branch 'jk/chunk-bounds' The codepaths that read "chunk" formatted files have been corrected to pay attention to the chunk size and notice broken files. * jk/chunk-bounds: (21 commits) t5319: make corrupted large-offset test more robust chunk-format: drop pair_chunk_unsafe() commit-graph: detect out-of-order BIDX offsets commit-graph: check bounds when accessing BIDX chunk commit-graph: check bounds when accessing BDAT chunk commit-graph: bounds-check generation overflow chunk commit-graph: check size of generations chunk commit-graph: bounds-check base graphs chunk commit-graph: detect out-of-bounds extra-edges pointers commit-graph: check size of commit data chunk midx: check size of revindex chunk midx: bounds-check large offset chunk midx: check size of object offset chunk midx: enforce chunk alignment on reading midx: check size of pack names chunk commit-graph: check consistency of fanout table midx: check size of oid lookup chunk commit-graph: check size of oid fanout chunk midx: stop ignoring malformed oid fanout chunk t: add library for munging chunk-format files ...	2023-10-23 13:56:36 -07:00
Junio C Hamano	a45eddec40	Merge branch 'jk/commit-graph-leak-fixes' Leakfix. * jk/commit-graph-leak-fixes: commit-graph: clear oidset after finishing write commit-graph: free write-context base_graph_name during cleanup commit-graph: free write-context entries before overwriting commit-graph: free graph struct that was not added to chain commit-graph: delay base_graph assignment in add_graph_to_chain() commit-graph: free all elements of graph chain commit-graph: move slab-clearing to close_commit_graph() merge: free result of repo_get_merge_bases() commit-reach: free temporary list in get_octopus_merge_bases() t6700: mark test as leak-free	2023-10-13 14:18:28 -07:00
Jeff King	581e0f8b18	commit-graph: check bounds when accessing BIDX chunk We load the bloom_filter_indexes chunk using pair_chunk(), so we have no idea how big it is. This can lead to out-of-bounds reads if it is smaller than expected, since we index it based on the number of commits found elsewhere in the graph file. We can check the chunk size up front, like we do for CDAT and other chunks with one fixed-size record per commit. The test case demonstrates the problem. It actually won't segfault, because we end up reading random data from the follow-on chunk (BDAT in this case), and the bounds checks added in the previous patch complain. But this is by no means assured, and you can craft a commit-graph file with BIDX at the end (or a smaller BDAT) that does segfault. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	920f400e91	commit-graph: check bounds when accessing BDAT chunk When loading Bloom filters from a commit-graph file, we use the offset values in the BIDX chunk to index into the memory mapped for the BDAT chunk. But since we don't record how big the BDAT chunk is, we just trust that the BIDX offsets won't cause us to read outside of the chunk memory. A corrupted or malicious commit-graph file will cause us to segfault (in practice this isn't a very interesting attack, since commit-graph files are local-only, and the worst case is an out-of-bounds read). We can't fix this by checking the chunk size during parsing, since the data in the BDAT chunk doesn't have a fixed size (that's why we need the BIDX in the first place). So we'll fix it in two parts: 1. Record the BDAT chunk size during parsing, and then later check that the BIDX offsets we look up are within bounds. 2. Because the offsets are relative to the end of the BDAT header, we must also make sure that the BDAT chunk is at least as large as the expected header size. Otherwise, we overflow when trying to move past the header, even for an offset of "0". We can check this early, during the parsing stage. The error messages are rather verbose, but since this is not something you'd expect to see outside of severe bugs or corruption, it makes sense to err on the side of too many details. Sadly we can't mention the filename during the chunk-parsing stage, as we haven't set g->filename at this point, nor passed it down through the stack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	ee6a792412	commit-graph: bounds-check generation overflow chunk If the generation entry in a commit-graph doesn't fit, we instead insert an offset into a generation overflow chunk. But since we don't record the size of the chunk, we may read outside the chunk if the offset we find on disk is malicious or corrupted. We can't check the size of the chunk up-front; it will vary based on how many entries need overflow. So instead, we'll do a bounds-check before accessing the chunk memory. Unfortunately there is no error-return from this function, so we'll just have to die(), which is what it does for other forms of corruption. As with other cases, we can drop the st_mult() call, since we know our bounds-checked value will fit within a size_t. Before this patch, the test here actually "works" because we read garbage data from the next chunk. And since that garbage data happens not to provide a generation number which changes the output, it appears to work. We could construct a case that actually segfaults or produces wrong output, but it would be a bit tricky. For our purposes its sufficient to check that we've detected the bounds error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	4a3c34662b	commit-graph: check size of generations chunk We neither check nor record the size of the generations chunk we parse from a commit-graph file. This should have one uint32_t for each commit in the file; if it is smaller (due to corruption, etc), we may read outside the mapped memory. The included test segfaults without this patch, as it shrinks the size considerably (and the chunk is near the end of the file, so we read off the end of the array rather than accidentally reading another chunk). We can fix this by checking the size up front (like we do for other fixed-size chunks, like CDAT). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	6cf61d0db5	commit-graph: bounds-check base graphs chunk When we are loading a commit-graph chain, we check that each slice of the chain points to the appropriate set of base graphs via its BASE chunk. But since we don't record the size of the chunk, we may access out-of-bounds memory if the file is corrupted. Since we know the number of entries we expect to find (based on the position within the commit-graph-chain file), we can just check the size up front. In theory this would also let us drop the st_mult() call a few lines later when we actually access the memory, since we know that the computed offset will fit in a size_t. But because the operands "g->hash_len" and "n" have types "unsigned char" and "int", we'd have to cast to size_t first. Leaving the st_mult() does that cast, and makes it more obvious that we don't have an overflow problem. Note that the test does not actually segfault before this patch, since it just reads garbage from the chunk after BASE (and indeed, it even rejects the file because that garbage does not have the expected hash value). You could construct a file with BASE at the end that did segfault, but corrupting the existing one is easy, and we can check stderr for the expected message. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	9622610e55	commit-graph: detect out-of-bounds extra-edges pointers If an entry in a commit-graph file has more than 2 parents, the fixed-size parent fields instead point to an offset within an "extra edges" chunk. We blindly follow these, assuming that the chunk is present and sufficiently large; this can lead to an out-of-bounds read for a corrupt or malicious file. We can fix this by recording the size of the chunk and adding a bounds-check in fill_commit_in_graph(). There are a few tricky bits: 1. We'll switch from working with a pointer to an offset. This makes some corner cases just fall out naturally: a. If we did not find an EDGE chunk at all, our size will correctly be zero (so everything is "out of bounds"). b. Comparing "size / 4" lets us make sure we have at least 4 bytes to read, and we never compute a pointer more than one element past the end of the array (computing a larger pointer is probably OK in practice, but is technically undefined behavior). c. The current code casts to "uint32_t ". Replacing it with an offset avoids any comparison between different types of pointer (since the chunk is stored as "unsigned char "). 2. This is the first case in which fill_commit_in_graph() may return anything but success. We need to make sure to roll back the "parsed" flag (and any parents we might have added before running out of buffer) so that the caller can cleanly fall back to loading the commit object itself. It's a little non-trivial to do this, and we might benefit from factoring it out. But we can wait on that until we actually see a second case where we return an error. As a bonus, this lets us drop the st_mult() call. Since we've already done a bounds check, we know there won't be any integer overflow (it would imply our buffer is larger than a size_t can hold). The included test does not actually segfault before this patch (though you could construct a case where it does). Instead, it reads garbage from the next chunk which results in it complaining about a bogus parent id. This is sufficient for our needs, though (we care that the fallback succeeds, and that stderr mentions the out-of-bounds read). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	b72df612af	commit-graph: check size of commit data chunk We expect a commit-graph file to have a fixed-size data record for each commit in the file (and we know the number of commits to expct from the size of the lookup table). If we encounter a file where this is too small, we'll look past the end of the chunk (and possibly even off the mapped memory). We can fix this by checking the size up front when we record the pointer. The included test doesn't segfault, since it ends up reading bytes from another chunk. But it produces nonsense results, since the values it reads are garbage. Our test notices this by comparing the output to a non-corrupted run of the same command (and of course we also check that the expected error is printed to stderr). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	c9b9fefc13	midx: enforce chunk alignment on reading The midx reader assumes chunks are aligned to a 4-byte boundary: we treat the fanout chunk as an array of uint32_t, indexing it to feed the results to ntohl(). Without aligning the chunks, we may violate the CPU's alignment constraints. Though many platforms allow this, some do not. And certanily UBSan will complain, since it is undefined behavior. Even though most chunks are naturally 4-byte-aligned (because they are storing uint32_t or larger types), PNAM is not. It stores NUL-terminated pack names, so you can have a valid chunk with any length. The writing side handles this by 4-byte-aligning the chunk, introducing a few extra NULs as necessary. But since we don't check this on the reading side, we may end up with a misaligned fanout and trigger the undefined behavior. We have two options here: 1. Swap out ntohl(fanout[i]) for get_be32(fanout+i) everywhere. The latter handles alignment itself. It's possible that it's slightly slower (though in practice I'm not sure how true that is, especially for these code paths which then go on to do a binary search). 2. Enforce the alignment when reading the chunks. This is easy to do, since the table-of-contents reader can check it in one spot. I went with the second option here, just because it places less burden on maintenance going forward (it is OK to continue using ntohl), and we know it can't have any performance impact on the actual reads. The commit-graph code uses the same chunk API. It's usually also 4-byte aligned, but some chunks are not (like Bloom filter BDAT chunks). So we'll pass "1" here to allow any alignment. It doesn't suffer from the same problem as midx with its fanout because the fanout chunk is always the first (and the rest of the format dictates that the first chunk will start aligned). The new test shows the effect on a midx with a misaligned PNAM chunk. Note that the midx-reading code treats chunk-toc errors as soft, falling back to the non-midx path rather than calling die(), as we do for other parsing errors. Arguably we should make all of these behave the same, but that's out of scope for this patch. For now the test just expects the fallback behavior. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:01 -07:00
Jeff King	4169d89645	commit-graph: check consistency of fanout table We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Jeff King	52e2e8d43d	commit-graph: check size of oid fanout chunk We load the oid fanout chunk with pair_chunk(), which means we never see the size of the chunk. We just assume the on-disk file uses the appropriate size, and if it's too small we'll access random memory. It's easy to check this up-front; the fanout always consists of 256 uint32's, since it is a fanout of the first byte of the hash pointing into the oid index. These parameters can't be changed without introducing a new chunk type. This matches the similar check in the midx OIDF chunk (but note that rather than checking for the error immediately, the graph code just leaves parts of the struct NULL and checks for required fields later). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Jeff King	570b8b8836	chunk-format: note that pair_chunk() is unsafe The pair_chunk() function is provided as an easy helper for parsing chunks that just want a pointer to a set of bytes. But every caller has a hidden bug: because we return only the pointer without the matching chunk size, the callers have no clue how many bytes they are allowed to look at. And as a result, they may read off the end of the mmap'd data when the on-disk file does not match their expectations. Since chunk files are typically used for local-repository data like commit-graph files and midx's, the security implications here are pretty mild. The worst that can happen is that you hand somebody a corrupted repository tarball, and running Git on it does an out-of-bounds read and crashes. So it's worth being more defensive, but we don't need to drop everything and fix every caller immediately. I noticed the problem because the pair_chunk_fn() callback does not look at its chunk_size argument, and wanted to annotate it to silence -Wunused-parameter. We could do that now, but we'd lose the hint that this code should be audited and fixed. So instead, let's set ourselves up for going down that path: 1. Provide a pair_chunk() function that does return the size, which prepares us for fixing these cases. 2. Rename the existing function to pair_chunk_unsafe(). That gives us an easy way to grep for cases which still need to be fixed, and the name should cause anybody adding new calls to think twice before using it. There are no callers of the "safe" version yet, but we'll add some in subsequent patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-09 15:55:00 -07:00
Jeff King	d9c84c6d67	commit-graph: free write-context base_graph_name during cleanup Commit `6c622f9f0b` (commit-graph: write commit-graph chains, 2019-06-18) added a base_graph_name string to the write_commit_graph_context struct. But the end-of-function cleanup forgot to free it, causing a leak. This (presumably in combination with the preceding leak-fixes) lets us mark t5328 as leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:24 -07:00
Jeff King	274bfa7f28	commit-graph: free write-context entries before overwriting When writing a split graph file, we replace the final element of the commit_graph_hash_after and commit_graph_filenames_after arrays. But since these are allocated strings, we need to free them before overwriting to avoid leaking the old string. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:24 -07:00
Jeff King	1d94abfe1e	commit-graph: free graph struct that was not added to chain When reading the graph chain file, we open (and allocate) each individual slice it mentions and then add them to a linked-list chain. But if adding to the chain fails (e.g., because the base-graph chunk it contains didn't match what we expected), we leave the function without freeing the graph struct that caused the failure, leaking it. We can fix it by calling free_graph_commit(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:24 -07:00
Jeff King	991d549f74	commit-graph: delay base_graph assignment in add_graph_to_chain() When adding a graph to a chain, we do some consistency checks and then if everything looks good, set g->base_graph to add a link to the chain. But when we added a new consistency check in `209250ef38` (commit-graph.c: prevent overflow in add_graph_to_chain(), 2023-07-12), it comes _after_ we've already set g->base_graph. So we might return failure, even though we actually added to the chain. This hasn't caused a bug yet, because after failing to add to the chain, we discard the failed graph struct completely, leaking it. But in order to fix that, it's important that the struct be in a consistent and predictable state after the failure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:24 -07:00
Jeff King	09a75abba4	commit-graph: free all elements of graph chain When running "commit-graph verify", we call free_commit_graph(). That's sufficient for the case of a single graph file, but if we loaded a chain of split graph files, they form a linked list via the base_graph pointers. We need to free all of them, or we leak all but the first struct. We can make this work by teaching free_commit_graph() to walk the base_graph pointers and free each element. This in turn lets us simplify close_commit_graph(), which does the same thing by recursion (we cannot just use close_commit_graph() in "commit-graph verify", as the function takes a pointer to an object store, and the verify command creates a single one-off graph struct). While indenting the code in free_commit_graph() for the loop, I noticed that setting g->data to NULL is rather pointless, as we free the struct a few lines later. So I cleaned that up while we're here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:24 -07:00
Jeff King	ac6d45d11f	commit-graph: move slab-clearing to close_commit_graph() When closing and freeing a commit-graph, the main entry point is close_commit_graph(), which then uses close_commit_graph_one() to recurse through the base_graph links and free each one. Commit `957ba814bf` (commit-graph: when closing the graph, also release the slab, 2021-09-08) put the call to clear the slab into the recursive function, but this is pointless: there's only a single global slab variable. It works OK in practice because clearing the slab is idempotent, but it makes the code harder to reason about and refactor. Move it into the parent function so it's only called once (and there are no other direct callers of the recursive close_commit_graph_one(), so we are not hurting them). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-03 14:28:24 -07:00
Jeff King	5f259197ee	commit-graph: report incomplete chains during verification The load_commit_graph_chain_fd_st() function will stop loading chains when it sees an error. But if it has loaded any graph slice at all, it will return it. This is a good thing for normal use (we use what data we can, and this is just an optimization). But it's a bad thing for "commit-graph verify", which should be careful about finding any irregularities. We do complain to stderr with a warning(), but the verify command still exits with a successful return code. The new tests here cover corruption of both the base and tip slices of the chain. The corruption of the base file already works (it is the first file we look at, so when we see the error we return NULL). The "tip" case is what is fixed by this patch (it complains to stderr but still returns the base slice). Likewise the existing tests for corruption of the commit-graph-chain file itself need to be updated. We already exited non-zero correctly for the "base" case, but the "tip" case can now do so, too. Note that this also causes us to adjust a test later in the file that similarly corrupts a tip (though confusingly the test script calls this "base"). It checks stderr but erroneously expects the whole "verify" command to exit with a successful code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-28 07:00:43 -07:00
Jeff King	7754a565e2	commit-graph: tighten chain size check When we open a commit-graph-chain file, if it's smaller than a single entry, we just quietly treat that as ENOENT. That make some sense if the file is truly zero bytes, but it means that "commit-graph verify" will quietly ignore a file that contains garbage if that garbage happens to be short. Instead, let's only simulate ENOENT when the file is truly empty, and otherwise return EINVAL. The normal graph-loading routines don't care, but "commit-graph verify" will notice and complain about the difference. It's not entirely clear to me that the 0-is-ENOENT case actually happens in real life, so we could perhaps just eliminate this special-case altogether. But this is how we've always behaved, so I'm preserving it in the name of backwards compatibility (though again, it really only matters for "verify", as the regular routines are happy to load what they can). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-28 07:00:43 -07:00
Jeff King	8298b54317	commit-graph: check mixed generation validation when loading chain file In read_commit_graph_one(), we call validate_mixed_generation_chain() after loading the graph. Even though we don't check the return value, this has the side effect of clearing the read_generation_data flag, which is important when working with mixed generation numbers. But doing this in load_commit_graph_chain_fd_st() makes more sense: 1. We are calling it even when we did not load a chain at all, which is pointless (you cannot have mixed generations in a single file). 2. For now, all callers load the graph via read_commit_graph_one(). But the point of factoring out the open/load in the previous commit was to let "commit-graph verify" call them separately. So it needs to trigger this function as part of the load. Without this patch, the mixed-generation tests in t5324 would start failing on "git commit-graph verify" calls, once we switch to using a separate open/load call there. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-28 07:00:43 -07:00
Jeff King	7ed76b4eb2	commit-graph: factor out chain opening function The load_commit_graph_chain() function opens the chain file and all of the slices of graph that it points to. If there is no chain file (which is a totally normal condition), we return NULL. But if we run into errors with the chain file or loading the actual graph data, we also return NULL, and the caller cannot tell the difference. The caller can check for ENOENT for the unremarkable "no such file" case. But I'm hesitant to assume that the rest of the function would never accidentally set errno to ENOENT itself, since it is opening the slice files (and that would mean the caller fails to notice a real error). So let's break this into two functions: one to open the file, and one to actually load it. This matches the interface we provide for the non-chain graph file, which will also come in handy in a moment when we fix some bugs in the "git commit-graph verify" code. Some notes: - I've kept the "1 is good, 0 is bad" return convention (and the weird "fd" out-parameter) used by the matching open_commit_graph() function and other parts of the commit-graph code. This is unlike most of the rest of Git (which would just return the fd, with -1 for error), but it makes sense to stay consistent with the adjacent bits of the API here. - The existing chain loading function will quietly return if the file is too small to hold a single entry. I've retained that behavior (and explicitly set ENOENT in the opener function) for now, under the notion that it's probably valid (though I'd imagine unusual) to have an empty chain file. There are two small behavior changes here, but I think both are strictly positive: 1. The original blindly did a stat() before checking if fopen() succeeded, meaning we were making a pointless extra stat call. 2. We now use fstat() to check the file size. The previous code using a regular stat() on the pathname meant we could technically race with somebody updating the chain file, and end up with a size that does not match what we just opened with fopen(). I doubt anybody ever hit this in practice, but it may have caused an out-of-bounds read. We'll retain the load_commit_graph_chain() function which does both the open and reading steps (most existing callers do not care about seeing errors anyway, since loading commit-graphs is optimistic). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-09-28 07:00:43 -07:00
Junio C Hamano	25ff15d108	Merge branch 'jk/unused-post-2.42' Unused parameters to functions are marked as such, and/or removed, in order to bring us closer to -Wunused-parameter clean. * jk/unused-post-2.42: (22 commits) update-ref: mark unused parameter in parser callbacks gc: mark unused descriptors in scheduler callbacks bundle-uri: mark unused parameters in callbacks fetch: mark unused parameter in ref_transaction callback credential: mark unused parameter in urlmatch callback grep: mark unused parmaeters in pcre fallbacks imap-send: mark unused parameters with NO_OPENSSL worktree: mark unused parameters in noop repair callback negotiator/noop: mark unused callback parameters add-interactive: mark unused callback parameters grep: mark unused parameter in output function test-trace2: mark unused argv/argc parameters trace2: mark unused config callback parameter trace2: mark unused us_elapsed_absolute parameters stash: mark unused parameter in diff callback ls-tree: mark unused parameter in callback commit-graph: mark unused data parameters in generation callbacks worktree: mark unused parameters in each_ref_fn callback pack-bitmap: mark unused parameters in show_object callback ref-filter: mark unused parameters in parser callbacks ...	2023-09-07 15:06:07 -07:00
Jeff King	e1cba404db	commit-graph: mark unused data parameters in generation callbacks The compute_generation_info code uses function pointers to abstract the get/set generation operations. Some callers don't need the extra void data pointer, which should be annotated to appease -Wunused-parameter. Note that we can drop the assignment of the "data" parameter in compute_generation_numbers(), as we've just shown that neither of the callbacks it uses will access it. This matches the caller in ensure_generations_valid(), which already does not bother to set "data". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-08-29 17:56:24 -07:00
Taylor Blau	db6044d762	commit-graph: avoid repeated mixed generation number warnings When validating that a commit-graph has either all zero, or all non-zero generation numbers, we emit a warning on both the rising and falling edge of transitioning between the two. So if we are unfortunate enough to see a commit-graph which has a repeating sequence of zero, then non-zero generation numbers, we'll generate many warnings that contain more or less the same information. Avoid this by keeping track of a single example for a commit with zero- and non-zero generation, and emit a single warning at the end of verification if both are non-NULL. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-08-21 16:16:23 -07:00
Jeff King	cc9c9a00a5	commit-graph: verify swapped zero/non-zero generation cases In verify_one_commit_graph(), we have code that complains when a commit is found with a generation number of zero, and then later with a non-zero number. It works like this: 1. When we see an entry with generation zero, we set the generation_zero flag to GENERATION_ZERO_EXISTS. 2. When we later see an entry with a non-zero generation, we complain if the flag is GENERATION_ZERO_EXISTS. There's a matching GENERATION_NUMBER_EXISTS value, which in theory would be used to find the case that we see the entries in the opposite order: 1. When we see an entry with a non-zero generation, we set the generation_zero flag to GENERATION_NUMBER_EXISTS. 2. When we later see an entry with a zero generation, we complain if the flag is GENERATION_NUMBER_EXISTS. But that doesn't work; step 2 is implemented, but there is no step 1. We never use NUMBER_EXISTS at all, and Coverity rightly complains that step 2 is dead code. We can fix that by implementing that step 1. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-08-21 16:07:33 -07:00
Taylor Blau	868c991155	commit-graph: introduce `commit_graph_generation_from_graph()` In `2ee11f7261` (commit-graph: return generation from memory, 2023-03-20), the `commit_graph_generation()` function stopped returning zeros when asked to locate the generation number of a given commit. This was done at the time to prepare for a later change which set generation values in memory, meaning that we could no longer rely on `graph_pos` alone to tell us whether or not to trust the generation number returned by this function. In `2ee11f7261`, it was noted that this change only impacted very old commit-graphs, which were written with all commits having generation number 0. Indeed, zero is not a valid generation number, so we should never expect to see that value outside of the aforementioned case. The test fallout in `2ee11f7261` indicated that we were no longer able to fsck a specific old case of commit-graph corruption, where we see a non-zero generation number after having seen a generation number of 0 earlier. Introduce a variant of `commit_graph_generation()` which behaves like that function did prior to `2ee11f7261`, known as `commit_graph_generation_from_graph()`. Then use this function in the context of `verify_one_commit_graph()`, where we only want to trust the values from the graph. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-08-21 16:07:33 -07:00
Junio C Hamano	4488bb3bed	Merge branch 'tb/object-access-overflow-protection' Various offset computation in the code that accesses the packfiles and other data in the object layer has been hardened against arithmetic overflow, especially on 32-bit systems. * tb/object-access-overflow-protection: commit-graph.c: prevent overflow in `verify_commit_graph()` commit-graph.c: prevent overflow in `write_commit_graph()` commit-graph.c: prevent overflow in `merge_commit_graph()` commit-graph.c: prevent overflow in `split_graph_merge_strategy()` commit-graph.c: prevent overflow in `load_tree_for_commit()` commit-graph.c: prevent overflow in `fill_commit_in_graph()` commit-graph.c: prevent overflow in `fill_commit_graph_info()` commit-graph.c: prevent overflow in `load_oid_from_graph()` commit-graph.c: prevent overflow in add_graph_to_chain() commit-graph.c: prevent overflow in `write_commit_graph_file()` pack-bitmap.c: ensure that eindex lookups don't overflow midx.c: prevent overflow in `fill_included_packs_batch()` midx.c: prevent overflow in `write_midx_internal()` midx.c: store `nr`, `alloc` variables as `size_t`'s midx.c: prevent overflow in `nth_midxed_offset()` midx.c: prevent overflow in `nth_midxed_object_oid()` midx.c: use `size_t`'s for fanout nr and alloc packfile.c: use checked arithmetic in `nth_packed_object_offset()` packfile.c: prevent overflow in `load_idx()` packfile.c: prevent overflow in `nth_packed_object_id()`	2023-07-25 12:05:23 -07:00

1 2 3 4 5 ...

423 Commits