development/git - HydraGit

mirror of https://github.com/git/git synced 2024-08-27 03:29:21 +00:00

Author	SHA1	Message	Date
Taylor Blau	51c31a6408	commit-graph.c: prevent overflow in `load_tree_for_commit()` In a similar spirit as previous commits, ensure that we don't overflow when computing an offset into the commit_data chunk when the (relative) graph position exceeds 2^32-1/GRAPH_DATA_WIDTH. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	50a71c2942	commit-graph.c: prevent overflow in `fill_commit_in_graph()` In a similar spirit as previous commits, ensure that we don't overflow when the lex_index of the commit we are trying to fill out exceeds 2^32-1/(g->hash_len+16). The other hunk touched in this patch is not susceptible to overflow, since an explicit cast is made to a 64-bit unsigned value. For clarity and consistency with the rest of the commits in this series, avoid a tricky to reason about cast, and use `st_mult()` directly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	2740ed1c76	commit-graph.c: prevent overflow in `fill_commit_graph_info()` In a similar spirit as previous commits, ensure that we don't overflow in a few spots within `fill_commit_graph_info()`: - First, when computing an offset into the commit data chunk, which can occur when the `lex_index` of the item we're looking up exceeds 2^32-1/GRAPH_DATA_WIDTH. - A similar issue when computing the generation date offset for commits with `lex_index` greater than 2^32-1/4. Note that in practice this will never overflow, since the left-hand operand is from calling `sizeof(...)` and is thus already a `size_t`. But wrap that in an `st_mult()` to make it clear that we intend to perform this computation using 64-bit operands. - Finally, a nearly identical issue as above when computing an offset into the `generation_data_overflow` chunk. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	0bd8f30a0e	commit-graph.c: prevent overflow in `load_oid_from_graph()` In a similar spirit as previous commits, ensure that we don't overflow when trying to compute an offset into the `chunk_oid_lookup` table when the `lex_index` of the item we're trying to look up exceeds `2^32-1/g->hash_len`. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	209250ef38	commit-graph.c: prevent overflow in add_graph_to_chain() The commit-graph uses a fanout table with 4-byte entries to store the number of commits at each shard of the commit-graph. So it is OK to have a commit graph with as many as 2^32-1 stored commits. But we risk overflowing any computation which may exceed the 32-bit (unsigned) maximum when those computations are (incorrectly) performed using 32-bit operands. There are a couple of spots in `add_graph_to_chain()` where we could potentially overflow the result: - First, when comparing the list of existing entries in the commit-graph chain. It is unlikely that this should ever overflow, since it would require having roughly 2^32-1/g->hash_len commit-graphs in the chain. But let's guard that computation with a `st_mult()` just to be safe. - Second, when computing the number of commits in the graph added to the front of the chain. This value is also a 32-bit unsigned, but we should make sure that it does not grow beyond the maximum value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	48f3f8cf37	commit-graph.c: prevent overflow in `write_commit_graph_file()` When writing a commit-graph, we use the chunk-format API to write out each individual chunk of the commit-graph. Each chunk of the commit-graph is tracked via a call to `add_chunk()`, along with the expected size of that chunk. Similar to an earlier commit which handled the identical issue in the MIDX machinery, guard against overflow when dealing with a commit-graph with a large number of entries to avoid corrupting the contents of the commit-graph itself. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	0948c50176	pack-bitmap.c: ensure that eindex lookups don't overflow When a bitmap is used to answer some reachability query, it creates a pseudo-bitmap called the "extended index" on top of any existing bitmaps to store objects that are relevant to the query, but not mentioned in the bitmap. When looking up the ith object in the extended index in a bitmap, it is common to write something like: bitmap_get(result, i + bitmap_num_objects(bitmap_git)) , indicating that we want the ith object following all other objects mentioned in the bitmap_git. Since the type of `i` and the return type of `bitmap_num_objects()` are both `uint32_t`s, But if there are either a large number of objects in the bitmap, or a large number of objects in the extended index (or both), this addition can overflow when the sum is greater than 2^32-1. Having that large of a bitmap position is entirely acceptable, but we need to ensure that the computed bitmap position for that object is performed using 64-bits and doesn't overflow. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	d67609bdde	midx.c: prevent overflow in `fill_included_packs_batch()` In a similar spirit as in previous commits, avoid an integer overflow when computing the expected size of a MIDX. (Note that this is also OK as-is, since `p->pack_size` is an `off_t`, so this computation should already be done as 64-bit integers. But again, let's use `st_mult()` to make this fact clear). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	2bc764c1d4	midx.c: prevent overflow in `write_midx_internal()` When writing a MIDX, we use the chunk-format API to write out each individual chunk of the MIDX. Each chunk of the MIDX is tracked via a call to `add_chunk()`, along with the expected size of that chunk. Guard against overflow when dealing with a MIDX with a large number of entries (and consequently, large chunks within the MIDX file itself) to avoid corrupting the contents of the MIDX itself. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	cc38127439	midx.c: store `nr`, `alloc` variables as `size_t`'s In the `write_midx_context` structure, we use two `uint32_t`'s to track the length and allocated size of the packs, and one `uint32_t` to track the number of objects in the MIDX. In practice, having these be 32-bit unsigned values shouldn't cause any problems since we are unlikely to have that many objects or packs in any real-world repository. But these values should be `size_t`'s, so change their type to reflect that. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	5675150cc3	midx.c: prevent overflow in `nth_midxed_offset()` In a similar spirit as previous patches, avoid an overflow when looking up object offsets in the MIDX's large offset table by guarding the computation via `st_mult()`. This instance is also OK as-is, since the left operand is the result of `sizeof(...)`, which is already a `size_t`. But use `st_mult()` instead here to make it explicit that this computation is to be performed using 64-bit unsigned integers. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	c2b24ede22	midx.c: prevent overflow in `nth_midxed_object_oid()` In a similar spirit as previous commits, avoid overflow when looking up an object's OID in a MIDX when its position is greater than `2^32-1/m->hash_len`. As usual, it is perfectly OK for a MIDX to have as many as 2^32-1 objects (since we use 32-bit fields to count the number of objects at each fanout layer). But if we have more than `2^32-1/m->hash_len` number of objects, we will incorrectly perform the computation using 32-bit integers, overflowing the result. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	e6c71f239d	midx.c: use `size_t`'s for fanout nr and alloc The `midx_fanout` struct is used to keep track of a set of OIDs corresponding to each layer of the MIDX's fanout table. It stores an array of entries, along with the number of entries in the table, and the allocated size of the array. Both `nr` and `alloc` are stored as 32-bit unsigned integers. In practice, this should never cause any problems, since most packs have far fewer than 2^32-1 objects. But storing these as `size_t`'s is more appropriate, and prevents us from accidentally overflowing some result when multiplying or adding to either of these values. Update these struct members to be `size_t`'s as appropriate. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	a519abca02	packfile.c: use checked arithmetic in `nth_packed_object_offset()` In a similar spirit as the previous commits, ensure that we use `st_add()` or `st_mult()` when computing values that may overflow the 32-bit unsigned limit. Note that in each of these instances, we prevent 32-bit overflow already since we have explicit casts to `size_t`. So this code is OK as-is, but let's clarify it by using the `st_xyz()` helpers to make it obvious that we are performing the relevant computations using 64 bits. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:32:03 -07:00
Taylor Blau	42be681b33	packfile.c: prevent overflow in `load_idx()` Prevent an overflow when locating a pack's CRC offset when the number of packed items is greater than 2^32-1/hashsz by guarding the computation with an `st_mult()`. Note that to avoid truncating the result, the `crc_offset` member must itself become a `size_t`. The only usage of this variable (besides the assignment in `load_idx()`) is in `read_v2_anomalous_offsets()` in the index-pack code. There we use the `crc_offset` as a pointer offset, so we are already equipped to handle the type change. Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-14 09:31:34 -07:00
Jeff King	1e9cb3487a	t/helper: mark unused callback void data parameters Many callback interfaces have an extra void data parameter, but we don't always need it (especially for dumping functions like the ones in test helpers). Mark them as unused to avoid -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:24:00 -07:00
Jeff King	cc2f810172	tag: mark unused parameters in each_tag_name_fn callbacks We iterate over the set of input tag names using callbacks. But not all operations need the same inputs, so some parameters go unused (but of course not the same ones for each operation). Mark the unused ones to avoid -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:24:00 -07:00
Jeff King	1e6459efca	rev-parse: mark unused parameter in for_each_abbrev callback We don't need to use the "data" parameter in this instance. Let's mark it to avoid -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:24:00 -07:00
Jeff King	4c7b06f208	replace: mark unused parameter in each_mergetag_fn callback We don't look at the "commit" parameter to our callback, as our "mergetag_data" pointer contains the original name "ref", which we use instead. But we can't get rid of it, since other for_each_mergetag callbacks do use it. Let's mark the parameter to avoid -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:24:00 -07:00
Jeff King	80d4e5f3a5	replace: mark unused parameter in ref callback We don't look at the "flags" parameter, which is natural for something that is just printing the contents of the replace refs. But let's mark it to appease -Wunused-parameter. This probably should have been part of `63e14ee2d6` (refs: mark unused each_ref_fn parameters, 2022-08-19), but I missed it as this one is a repo_each_ref_fn, which takes an extra repository argument. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:24:00 -07:00
Jeff King	ee550abcce	merge-tree: mark unused parameter in traverse callback Our threeway_callback() does not bother to look at its "n" parameter. It is static in this file and used only by trivial_merge_trees(), which always passes 3 trees (hence the name "threeway"). It also does not look at "dirmask". This is OK, as it handles directories specifically by looking at the mode bits. Other traverse_info callbacks need these, so we can't get drop them from the interface. But let's annotate these ones to avoid complaints from -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:24:00 -07:00
Jeff King	0b4e9013f1	fsck: mark unused parameters in various fsck callbacks There are a few callback functions which are used with the fsck code, but it's natural that not all callbacks need all parameters. For reporting, even something as obvious as "the oid of the object which had a problem" is not always used, as some callers are only checking a single object in the first place. And for both reporting and walking, things like void data pointers and the fsck_options aren't always necessary. But since each such parameter is used by _some_ callback, we have to keep them in the interface. Mark the unused ones in specific callbacks to avoid triggering -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:24:00 -07:00
Jeff King	cc88afad62	revisions: drop unused "opt" parameter in "tweak" callbacks The setup_revision_opt struct has a "tweak" function pointer, which can be used to adjust parameters after setup_revisions() parses arguments, but before it finalizes setup. In addition to the rev_info struct, the callback receives a pointer to the setup_revision_opt, as well. But none of the existing callbacks looks at the extra "opt" parameter, leading to -Wunused-parameter warnings. We could mark it as UNUSED, but instead let's remove it entirely. It's conceivable that it could be useful for a callback to have access to the "opt" struct. But in the 13 years that this mechanism has existed, nobody has used it. So let's just drop it in the name of simplifying. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:24:00 -07:00
Jeff King	506d35f13d	count-objects: mark unused parameter in alternates callback Callbacks to for_each_altodb() get a void data pointer, but we don't need it here. Mark it as unused to silence -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:24:00 -07:00
Jeff King	a8a8e75e9e	am: mark unused keep_cr parameters When parsing the input, we have a "keep_cr" parameter to tell us how to handle line endings. But this doesn't apply to stgit or hg patches (which are not mailbox formats where we have to worry about that), so we ignore the parameter entirely in those functions. Let's mark these as unused so that -Wunused-parameter does not complain about them. Note that we could just drop these parameters entirely. They are necessary to conform to the mail_conv_fn interface used by split_mail_conv(), but these two callbacks are the only ones used with that function. The other formats (which _do_ care about keep_cr) use split_mail_mbox(). But it's conceivable that we'd eventually add another format that does care about this option, so let's leave it as part of the generic interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:23:59 -07:00
Jeff King	e519ac35af	http-push: mark unused parameter in xml callback The xml_start_tag() function is passed the expat library's XML_SetElementHandler() function, so it has to conform to the expected interface. But we don't actually care about the attributes list. Mark it so that -Wunused-parameter does not complain. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:23:59 -07:00
Jeff King	d0144007b1	http: mark unused parameters in curl callbacks These functions are all used as callbacks for curl, so they have to conform to a particular interface. But they don't need all of their parameters: - fwrite_null() throws away the input, so it doesn't look at most parameters - fwrite_wwwauth() in theory could take the auth struct in its void pointer, but instead we just access it as the global http_auth (matching the rest of the code in this file) - curl_trace() always writes via the trace mechanism, so it doesn't need its void pointer to know where to send things. Likewise, it ignores the CURL parameter, since nothing we trace requires querying the handle. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:23:59 -07:00
Jeff King	1779deed39	do_for_each_ref_helper(): mark unused repository parameter This function gets a repository parameter because it's a callback for do_for_each_repo_ref_iterator(). But it's just a wrapper that passes along each call to a regular each_ref_fn callback, and the latter doesn't accept a repository argument. Probably in the long run all of the each_ref_fn callbacks should get a repository parameter, too. But changing that now would require updates all over the code base. Until that happens, let's annotate this wrapper callback to quiet the compiler's -Wunused-parameter warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:23:59 -07:00
Jeff King	b8ef49d54c	test-ref-store: drop unimplemented reflog-expire command The reflog-expire command has been unimplemented since it was added in `80f2a6097c` (t/helper: add test-ref-store to test ref-store functions, 2017-03-26). This causes -Wunused-parameter to complain, since the function just calls die() without looking at its arguments. We could mark these as UNUSED to silence the warning. But let's just drop the function. It has no callers in the test suite and is not doing anything useful, beyond perhaps reminding us that it's something we _could_ be testing. But since the bulk of the work in adding such tests would be the shell bits that actually examine the reflog state before and after expiration, this is not even a useful step in that direction. Somebody who wants to do that work later can easily add this function back. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 17:23:59 -07:00
D. Ben Knoble	bbb6acd998	i18n: mark more bundle.c strings for translation These two messages were introduced in `8ba221e245` (bundle: output hash information in 'verify', 2022-03-22) and `105c6f14ad` (bundle: parse filter capability, 2022-03-09) but never for translation. Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 15:21:10 -07:00
Alex Henrie	c577d65158	push: don't imply that integration is always required before pushing In a narrow but common case, the user is the only author of a branch and doesn't mind overwriting the corresponding branch on the remote. This workflow is especially common on GitHub, GitLab, and Gerrit, which keep a permanent record of every version of a branch that is pushed while a pull request is open for that branch. On those platforms, force-pushing is encouraged and is analogous to emailing a new version of a patchset. When giving advice about divergent branches, tell the user about `git pull`, but don't unconditionally instruct the user to do it. A less prescriptive message will help prevent users from thinking that they are required to create an integrated history instead of simply replacing the previous history. Also, don't put `git pull` in an awkward parenthetical, because `git pull` can always be used to reconcile branches and is the normal way to do so. Due to the difficulty of knowing which command for force-pushing is best suited to the user's situation, no specific advice is given about force-pushing. Instead, the user is directed to the Git documentation to read about possible ways forward that do not involve integration. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 09:14:58 -07:00
Alex Henrie	d92304ff5c	remote: don't imply that integration is always required before pushing In a narrow but common case, the user is the only author of a branch and doesn't mind overwriting the corresponding branch on the remote. This workflow is especially common on GitHub, GitLab, and Gerrit, which keep a permanent record of every version of a branch that is pushed while a pull request is open for that branch. On those platforms, force-pushing is encouraged and is analogous to emailing a new version of a patchset. When giving advice about divergent branches, tell the user about `git pull`, but don't unconditionally instruct the user to do it. A less prescriptive message will help prevent users from thinking that they are required to create an integrated history instead of simply replacing the previous history. Likewise, don't imply that `git pull` is only for merging. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 09:14:58 -07:00
Alex Henrie	b6f3da5132	wt-status: don't show divergence advice when committing When the user is in the middle of making a commit, they are not yet at the point where they are ready to think about integrating their local branch with the corresponding remote branch or force-pushing over the remote branch. Don't include advice on how to deal with divergent branches in the commit template, to avoid giving the impression that the divergence needs to be dealt with immediately. Similar advice will be printed when it is most relevant, that is, if the user does try to push without first reconciling the two branches. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-13 09:14:58 -07:00
Taylor Blau	de41d03e1c	packfile.c: prevent overflow in `nth_packed_object_id()` In `37fec86a83` (packfile: abstract away hash constant values, 2018-05-02), `nth_packed_object_id()` started using the variable `the_hash_algo->rawsz` instead of a fixed constant when trying to compute an offset into the ".idx" file for some object position. This can lead to surprising truncation when looking for an object towards the end of a large enough pack, like the following: (gdb) p hashsz $1 = 20 (gdb) p n $2 = 215043814 (gdb) p hashsz * n $3 = 5908984 , which is a debugger session broken on a known-bad call to the `nth_packed_object_id()` function. This behavior predates `37fec86a83`, and is original to the v2 index format, via: `74e34e1fca` (sha1_file.c: learn about index version 2, 2007-04-09). This is due to §6.4.4.1 of the C99 standard, which states that an untyped integer constant will take the first type in which the value can be accurately represented, among `int`, `long int`, and `long long int`. Since 20 can be represented as an `int`, and `n` is a 32-bit unsigned integer, the resulting computation is defined by §6.3.1.8, and the (signed) integer value representing `n` is converted to an unsigned type, meaning that `20 * n` (for `n` having type `uint32_t`) is equivalent to a multiplication between two unsigned 32-bit integers. When multiplying a sufficiently large `n`, the resulting value can exceed 2^32-1, wrapping around and producing an invalid result. Let's follow the example in `f86f769550` (compute pack .idx byte offsets using size_t, 2020-11-13) and replace this computation with `st_mult()`, which will ensure that the computation is done using 64-bits. While here, guard the corresponding computation for packs with v1 indexes, too. Though the likelihood of seeing a bug there is much smaller, since (a) v1 indexes are generated far less frequently than v2 indexes, and (b) they all correspond to packs no larger than 2 GiB, so having enough objects to trigger this overflow is unlikely if not impossible. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-12 21:44:59 -07:00
Taylor Blau	def390d593	builtin/repack.c: avoid dir traversal in `collect_pack_filenames()` When repacking, the function `collect_pack_filenames()` is responsible for collecting the set of existing packs in the repository, and partitioning them into "kept" (if the pack has a ".keep" file or was given via `--keep-pack`) and "nonkept" (otherwise) lists. This function comes from the original C port of git-repack.sh from back in `a1bbc6c017` (repack: rewrite the shell script in C, 2013-09-15), where it first appears as `get_non_kept_pack_filenames()`. At the time, the implementation was a fairly direct translation from the relevant portion of git-repack.sh, which looped over the results of find "$PACKDIR" -type f -name '*.pack' either ignoring the pack as kept, or adding it to the list of existing packs. So the choice to directly translate this function in terms of `readdir()` in `a1bbc6c017` made sense. At the time, it was possible to refine the C version in terms of packed_git structs, but was never done. However, manually enumerating a repository's packs via `readdir()` is confusing and error-prone. It leads to frustrating inconsistencies between which packs Git considers to be part of a repository (i.e., could be found in the list of packs from `get_all_packs()`), and which packs `collect_pack_filenames()` considers to meet the same criteria. This bit us in `73320e49ad` (builtin/repack.c: only collect fully-formed packs, 2023-06-07), and again in the previous commit. Prevent these issues from biting us in the future by implementing the `collect_pack_filenames()` function by looping over an array of pointers to `packed_git` structs, ensuring that we use the same criteria to determine the set of available packs. One gotcha here is that we have to ignore non-local packs, since the original version of `collect_pack_filenames()` only looks at the local pack directory to collect existing packs. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-11 13:07:51 -07:00
Derrick Stolee	0af067276e	builtin/repack.c: only repack `.pack`s that exist In `73320e49ad` (builtin/repack.c: only collect fully-formed packs, 2023-06-07), we switched the check for which packs to collect by starting at the .idx files and looking for matching .pack files. This avoids trying to repack pack-files that have not had their pack-indexes installed yet. However, it does cause maintenance to halt if we find the (problematic, but not insurmountable) case of a .idx file without a corresponding .pack file. In an environment where packfile maintenance is a critical function, such a hard stop is costly and requires human intervention to resolve (by deleting the .idx file). This was not the case before. We successfully repacked through this scenario until the recent change to scan for .idx files. Further, if we are actually in a case where objects are missing, we detect this at a different point during the reachability walk. In other cases, Git prepares its list of packfiles by scanning .idx files and then only adds it to the packfile list if the corresponding .pack file exists. It even does so without a warning! (See add_packed_git() in packfile.c for details.) This case is much less likely to occur than the failures seen before `73320e49ad`. Packfiles are "installed" by writing the .pack file before the .idx and that process can be interrupted. Packfiles _should_ be deleted by deleting the .idx first, followed by the .pack file, but unlink_pack_path() does not do this: it deletes the .pack _first_, allowing a window where this process could be interrupted. We leave the consideration of changing this order as a separate concern. Knowing that this condition is possible from interrupted Git processes and not other tools lends some weight that Git should be more flexible around this scenario. Add a check to see if the .pack file exists before adding it to the list for repacking. This will stop a number of maintenance failures seen in production but fixed by deleting the .idx files. This brings us closer to the case before `73320e49ad` in that 'git repack' will not fail when there is an orphaned .idx file, at least, not due to the way we scan for packfiles. In the case that the .pack file was erroneously deleted without copies of its objects in other installed packfiles, then 'git repack' will fail due to the reachable object walk. This does resolve the case where automated repacks will no longer be halted on this case. The tests in t7700 show both these successful scenarios and the case of failing if the .pack was truly required. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-11 13:07:50 -07:00
Taylor Blau	98456eff08	ls-refs.c: avoid enumerating hidden refs where possible In a similar fashion as in previous commits, teach `ls-refs` to avoid enumerating hidden references where possible. As before, this is linux.git with one hidden reference per commit. $ hyperfine -L v ,.compile 'git{v} -c protocol.version=2 ls-remote .' Benchmark 1: git -c protocol.version=2 ls-remote . Time (mean ± σ): 89.8 ms ± 0.6 ms [User: 84.3 ms, System: 5.7 ms] Range (min … max): 88.8 ms … 91.3 ms 32 runs Benchmark 2: git.compile -c protocol.version=2 ls-remote . Time (mean ± σ): 6.5 ms ± 0.1 ms [User: 2.4 ms, System: 4.3 ms] Range (min … max): 6.2 ms … 8.3 ms 397 runs Summary 'git.compile -c protocol.version=2 ls-remote .' ran 13.85 ± 0.33 times faster than 'git -c protocol.version=2 ls-remote .' Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	18b6b1b5c5	upload-pack.c: avoid enumerating hidden refs where possible In a similar fashion as a previous commit, teach `upload-pack` to avoid enumerating hidden references where possible. Note, however, that there are certain cases where cannot avoid enumerating even hidden references, in particular when either of: - `uploadpack.allowTipSHA1InWant`, or - `uploadpack.allowReachableSHA1InWant` are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`, respectively. When either of these bits are set, upload-pack's `is_our_ref()` function needs to consider the `HIDDEN_REF` bit of the referent's object flags. So we must visit all references, including the hidden ones, in order to mark their referents with the `HIDDEN_REF` bit. When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the `is_our_ref()` function considers only the `OUR_REF` bit, and not the `HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only to objects at the tips of non-hidden references, so we do not need to visit hidden references in this case. When neither of those bits are set, `upload-pack` can potentially avoid enumerating a large number of references. In the same example as a previous commit (linux.git with one hidden reference per commit, "refs/pull/N"): $ printf 0000 >in $ hyperfine --warmup=1 \ 'git -c transfer.hideRefs=refs/pull upload-pack . <in' \ 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \ 'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in Time (mean ± σ): 406.9 ms ± 1.1 ms [User: 357.3 ms, System: 49.5 ms] Range (min … max): 405.7 ms … 409.2 ms 10 runs Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in Time (mean ± σ): 406.5 ms ± 1.3 ms [User: 356.5 ms, System: 49.9 ms] Range (min … max): 404.6 ms … 408.8 ms 10 runs Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in Time (mean ± σ): 4.7 ms ± 0.2 ms [User: 0.7 ms, System: 3.9 ms] Range (min … max): 4.3 ms … 6.1 ms 472 runs Summary 'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran 86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' 86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in' As above, we must visit every reference when uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit far fewer references. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	cc2a1f98ac	builtin/receive-pack.c: avoid enumerating hidden references Now that `refs_for_each_fullref_in()` has the ability to avoid enumerating references matching certain pattern(s), use that to avoid visiting hidden refs when constructing the ref advertisement via receive-pack. Note that since this exclusion is best-effort, we still need `show_ref_cb()` to check whether or not each reference is hidden or not before including it in the advertisement. As was the case when applying this same optimization to `upload-pack`, `receive-pack`'s reference advertisement phase can proceed much quicker by avoiding enumerating references that will not be part of the advertisement. (Below, we're still using linux.git with one hidden refs/pull/N ref per commit): $ hyperfine -L v ,.compile 'git{v} -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git' Benchmark 1: git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git Time (mean ± σ): 89.1 ms ± 1.7 ms [User: 82.0 ms, System: 7.0 ms] Range (min … max): 87.7 ms … 95.5 ms 31 runs Benchmark 2: git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 0.5 ms, System: 3.9 ms] Range (min … max): 4.1 ms … 5.6 ms 508 runs Summary 'git.compile -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git' ran 20.00 ± 1.05 times faster than 'git -c transfer.hideRefs=refs/pull receive-pack --advertise-refs .git' Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	15af64dcfd	refs.h: implement `hidden_refs_to_excludes()` In subsequent commits, we'll teach `receive-pack` and `upload-pack` to use the new jump list feature in the packed-refs iterator by ignoring references which are mentioned via its respective hideRefs lists. However, the packed-ref jump lists cannot handle un-hiding rules (that begin with '!'), or namespace comparisons (that begin with '^'). Add a convenience function to the refs.h API to detect when either of these conditions are met, and returns an appropriate value to pass as excluded patterns. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	e6bf24d39a	refs.h: let `for_each_namespaced_ref()` take excluded patterns A future commit will want to call `for_each_namespaced_ref()` with a list of excluded patterns. We could introduce a variant of that function, say, `for_each_namespaced_ref_exclude()` which takes the extra parameter, and reimplement the original function in terms of that. But all but one caller (in `http-backend.c`) will supply the new parameter, so add the new parameter to `for_each_namespaced_ref()` itself instead of introducing a new function. For now, supply NULL for the list of excluded patterns at all callers to avoid changing behavior, which we will do in a future change. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	c45841fff8	revision.h: store hidden refs in a `strvec` In subsequent commits, it will be convenient to have a 'const char **' of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`, etc.), instead of a `string_list`. Convert spots throughout the tree that store the list of hidden refs from a `string_list` to a `strvec`. Note that in `parse_hide_refs_config()` there is an ugly const-cast used to avoid an extra copy of each value before trimming any trailing slash characters. This could instead be written as: ref = xstrdup(value); len = strlen(ref); while (len && ref[len - 1] == '/') ref[--len] = '\0'; strvec_push(hide_refs, ref); free(ref); but the double-copy (once when calling `xstrdup()`, and another via `strvec_push()`) is wasteful. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	c489f47a64	refs/packed-backend.c: add trace2 counters for jump list The previous commit added low-level tests to ensure that the packed-refs iterator did not enumerate excluded sections of the refspace. However, there was no guarantee that these sections weren't being visited, only that they were being suppressed from the output. To harden these tests, add a trace2 counter which tracks the number of regions skipped by the packed-refs iterator, and assert on its value. Suggested-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00
Taylor Blau	59c35fac54	refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) When iterating through the `packed-refs` file in order to answer a query like: $ git for-each-ref --exclude=refs/__hidden__ it would be useful to avoid walking over all of the entries in `refs/__hidden__/` when possible, since we know that the ref-filter code is going to throw them away anyways. In certain circumstances, doing so is possible. The algorithm for doing so is as follows: - For each excluded pattern, find the first record that matches it, and the first record that doesn't* match it (i.e. the location you'd next want to consider when excluding that pattern). - Sort the set of excluded regions from the previous step in ascending order of the first location within the `packed-refs` file that matches. - Clean up the results from the previous step: discard empty regions, and combine adjacent regions. The set of regions which remains is referred to as the "jump list", and never contains any references which should be included in the result set. Then when iterating through the `packed-refs` file, if `iter->pos` is ever contained in one of the regions from the previous steps, advance `iter->pos` past the end of that region, and continue enumeration. Note that we only perform this optimization when none of the excluded pattern(s) have special meta-characters in them. For a pattern like "refs/foo[ac]", the excluded regions ("refs/fooa", "refs/fooc", and everything underneath them) are not connected. A future implementation that handles this case may split the character class (pretending as if two patterns were excluded: "refs/fooa", and "refs/fooc"). There are a few other gotchas worth considering. First, note that the jump list is sorted, so once we jump past a region, we can avoid considering it (or any regions preceding it) again. The member `jump_pos` is used to track the first next-possible region to jump through. Second, note that the jump list is best-effort, since we do not handle loose references, and because of the meta-character issue above. The jump list may not skip past all references which won't appear in the results, but will never skip over a reference which does appear in the result set. In repositories with a large number of hidden references, the speed-up can be significant. Tests here are done with a copy of linux.git with a reference "refs/pull/N" pointing at every commit, as in: $ git rev-list HEAD \| awk '{ print "create refs/pull/" NR " " $0 }' \| git update-ref --stdin $ git pack-refs --all , it is significantly faster to have `for-each-ref` jump over the excluded references, as opposed to filtering them out after the fact: $ hyperfine \ 'git for-each-ref --format="%(objectname) %(refname)" \| grep -vE "^[0-9a-f]{40} refs/pull/"' \ 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' Benchmark 1: git for-each-ref --format="%(objectname) %(refname)" \| grep -vE "^[0-9a-f]{40} refs/pull/" Time (mean ± σ): 798.1 ms ± 3.3 ms [User: 687.6 ms, System: 146.4 ms] Range (min … max): 794.5 ms … 805.5 ms 10 runs Benchmark 2: git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull" Time (mean ± σ): 98.9 ms ± 1.4 ms [User: 93.1 ms, System: 5.7 ms] Range (min … max): 97.0 ms … 104.0 ms 29 runs Benchmark 3: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull" Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 0.7 ms, System: 3.8 ms] Range (min … max): 4.1 ms … 5.8 ms 524 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' ran 21.87 ± 1.05 times faster than 'git.prev for-each-ref --format="%(objectname) %(refname)" --exclude="refs/pull"' 176.52 ± 8.19 times faster than 'git for-each-ref --format="%(objectname) %(refname)" \| grep -vE "^[0-9a-f]{40} refs/pull/"' (Comparing stock git and this patch isn't quite fair, since an earlier commit in this series adds a naive implementation of the `--exclude` option. `git.prev` is built from the previous commit and includes this naive implementation). Using the jump list is fairly straightforward (see the changes to `refs/packed-backend.c::next_record()`), but constructing the list is not. To ensure that the construction is correct, add a new suite of tests in t1419 covering various corner cases (overlapping regions, partially overlapping regions, adjacent regions, etc.). Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00
Taylor Blau	d22d941ac0	refs/packed-backend.c: refactor `find_reference_location()` The function `find_reference_location()` is used to perform a binary search-like function over the contents of a repository's `$GIT_DIR/packed-refs` file. The search it implements is unlike a standard binary search in that the records it searches over are not of a fixed width, so the comparison must locate the end of a record before comparing it. Extract the core routine of `find_reference_location()` in order to implement a function in the following patch which will find the first location in the `packed-refs` file that doesn't match the given pattern. The behavior of `find_reference_location()` is unchanged. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00
Taylor Blau	b269ac53c0	refs: plumb `exclude_patterns` argument throughout The subsequent patch will want to access an optional `excluded_patterns` array within `refs/packed-backend.c` that will cull out certain references matching any of the given patterns on a best-effort basis. To do so, the refs subsystem needs to be updated to pass this value across a number of different locations. Prepare for a future patch by introducing this plumbing now, passing NULLs at top-level APIs in order to make that patch less noisy and more easily readable. Signed-off-by: Taylor Blau <me@ttaylorr.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00
Taylor Blau	8255dd8a3d	builtin/for-each-ref.c: add `--exclude` option When using `for-each-ref`, it is sometimes convenient for the caller to be able to exclude certain parts of the references. For example, if there are many `refs/__hidden__/` references, the caller may want to emit all references except* the hidden ones. Currently, the only way to do this is to post-process the output, like: $ git for-each-ref --format='%(refname)' \| grep -v '^refs/hidden/' Which is do-able, but requires processing a potentially large quantity of references. Teach `git for-each-ref` a new `--exclude=<pattern>` option, which excludes references from the results if they match one or more excluded patterns. This patch provides a naive implementation where the `ref_filter` still sees all references (including ones that it will discard) and is left to check whether each reference matches any excluded pattern(s) before emitting them. By culling out references we know the caller doesn't care about, we can avoid allocating memory for their storage, as well as spending time sorting the output (among other things). Even the naive implementation provides a significant speed-up on a modified copy of linux.git (that has a hidden ref pointing at each commit): $ hyperfine \ 'git.compile for-each-ref --format="%(objectname) %(refname)" \| grep -vE "[0-9a-f]{40} refs/pull/"' \ 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" \| grep -vE "[0-9a-f]{40} refs/pull/" Time (mean ± σ): 820.1 ms ± 2.0 ms [User: 703.7 ms, System: 152.0 ms] Range (min … max): 817.7 ms … 823.3 ms 10 runs Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/ Time (mean ± σ): 106.6 ms ± 1.1 ms [User: 99.4 ms, System: 7.1 ms] Range (min … max): 104.7 ms … 109.1 ms 27 runs Summary 'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran 7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" \| grep -vE "[0-9a-f]{40} refs/pull/"' Subsequent patches will improve on this by avoiding visiting excluded sections of the `packed-refs` file in certain cases. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00
Jeff King	284c55deb5	ref-filter.c: parameterize match functions over patterns `match_pattern()` and `match_name_as_path()` both take a `struct ref_filter `, and then store a stack variable `patterns` pointing at `filter->patterns`. The subsequent patch will add a new array of patterns to match over (the excluded patterns, via a new `git for-each-ref --exclude` option), treating the return value of these functions differently depending on which patterns are being used to match. Tweak `match_pattern()` and `match_name_as_path()` to take an array of patterns to prepare for passing either in. Once we start passing either in, `match_pattern()` will have little to do with a particular `struct ref_filter ` instance. To clarify this, drop it from the argument list, and replace it with the only bit of the `ref_filter` that we care about (`filter->ignore_case`). Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00
Jeff King	b571fb9800	ref-filter: add `ref_filter_clear()` We did not bother to clean up at all in `git branch` or `git tag`, and `git for-each-ref` only cleans up a couple of members. Add and call `ref_filter_clear()` when cleaning up a `struct ref_filter`. Running this patch (without any test changes) indicates a couple of now leak-free tests. This was found by running: $ make SANITIZE=leak $ make -C t GIT_TEST_PASSING_SANITIZE_LEAK=check GIT_TEST_OPTS=--immediate (Note that the `reachable_from` and `unreachable_from` lists should be cleaned as they are used. So this is just covering any case where we might bail before running the reachability check.) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00
Jeff King	311bfe18ce	ref-filter: clear reachable list pointers after freeing In `reach_filter()`, we pop all commits from the reachable lists, leaving them empty. But because we're operating on a list pointer that was passed by value, the original `filter.reachable_from` and `filter.unreachable_from` pointers are left dangling. As is the case with the previous commit, nobody touches either of these fields after calling `reach_filter()`, so leaving them dangling is OK. But this future proofs against dangerous situations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00

... 3 4 5 6 7 ...

70882 commits