development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-30 14:03:28 +00:00

Author	SHA1	Message	Date
Junio C Hamano	a82fa7bce8	Merge branch 'jk/upload-pack-v2-capability-cleanup' The upload-pack program, when talking over v2, accepted the packfile-uris protocol extension from the client, even if it did not advertise the capability, which has been corrected. * jk/upload-pack-v2-capability-cleanup: upload-pack: only accept packfile-uris if we advertised it upload-pack: use existing config mechanism for advertisement upload-pack: centralize setup of sideband-all config upload-pack: use repository struct to get config	2024-03-07 15:59:42 -08:00
Junio C Hamano	56d6084560	Merge branch 'jk/upload-pack-bounded-resources' Various parts of upload-pack has been updated to bound the resource consumption relative to the size of the repository to protect from abusive clients. * jk/upload-pack-bounded-resources: upload-pack: free tree buffers after parsing upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places upload-pack: always turn off save_commit_buffer upload-pack: disallow object-info capability by default upload-pack: accept only a single packfile-uri line upload-pack: use a strmap for want-ref lines upload-pack: use oidset for deepen_not list upload-pack: switch deepen-not list to an oid_array upload-pack: drop separate v2 "haves" array	2024-03-07 15:59:42 -08:00
Jeff King	a922bfa3b5	upload-pack: only accept packfile-uris if we advertised it Clients are only supposed to request particular capabilities or features if the server advertised them. For the "packfile-uris" feature, we only advertise it if uploadpack.blobpacfileuri is set, but we always accept a request from the client regardless. In practice this doesn't really hurt anything, as we'd pass the client's protocol list on to pack-objects, which ends up ignoring it. But we should try to follow the protocol spec, and tightening this up may catch buggy or misbehaving clients more easily. Thanks to recent refactoring, we can hoist the config check from upload_pack_advertise() into upload_pack_config(). Note the subtle handling of a value-less bool (which does not count for triggering an advertisement). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-29 08:10:42 -08:00
Jeff King	9a7b22959a	upload-pack: use existing config mechanism for advertisement When serving a v2 capabilities request, we call upload_pack_advertise() to tell us the set of features we can advertise to the client. That involves looking at various config options, all of which need to be kept in sync with the rules we use in upload_pack_config to set flags like allow_filter, allow_sideband_all, and so on. If these two pieces of code get out of sync then we may refuse to respect a capability we advertised, or vice versa accept one that we should not. Instead, let's call the same config helper that we'll use for processing the actual client request, and then just pick the values out of the resulting struct. This is only a little bit shorter than the current code, but we don't repeat any policy logic (e.g., we don't have to worry about the magic sideband-all environment variable here anymore). And this reveals a gap in the existing code: there is no struct flag for the packfile-uris capability (we accept it even if it is not advertised, which we should not). We'll leave the advertisement code for now and deal with it in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 15:30:41 -08:00
Jeff King	37aa89b068	upload-pack: centralize setup of sideband-all config We read uploadpack.allowsidebandall to set a matching flag in our upload_pack_data struct. But for our tests, we also respect GIT_TEST_SIDEBAND_ALL from the environment, and anybody looking at the flag in the struct needs to remember to check both. There's only one such piece of code now, but we're about to add another. So let's have the config step actually fold the environment value into the struct, letting the rest of the code use the flag in the obvious way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 15:30:41 -08:00
Jeff King	922fdefb84	upload-pack: use repository struct to get config Our upload_pack_v2() function gets a repository struct, but we ignore it totally. In practice this doesn't cause any problems, as it will never differ from the_repository. But in the spirit of taking a small step towards getting rid of the_repository, let's at least starting using it to grab config. There are probably other spots that could benefit, but it's a start. Note that we don't need to pass the repo for protected_config(); the whole point there is that we are not looking at repo config, so there is no repo-specific version of the function. For the v0 version of the protocol, we're not passed a repository struct, so we'll continue to use the_repository there. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 14:48:35 -08:00
Jeff King	6cd05e768b	upload-pack: free tree buffers after parsing When a client sends us a "want" or "have" line, we call parse_object() to get an object struct. If the object is a tree, then the parsed state means that tree->buffer points to the uncompressed contents of the tree. But we don't really care about it. We only really need to parse commits and tags; for trees and blobs, the important output is just a "struct object" with the correct type. But much worse, we do not ever free that tree buffer. It's not leaked in the traditional sense, in that we still have a pointer to it from the global object hash. But if the client requests many trees, we'll hold all of their contents in memory at the same time. Nobody really noticed because it's rare for clients to directly request a tree. It might happen for a lightweight tag pointing straight at a tree, or it might happen for a "tree:depth" partial clone filling in missing trees. But it's also possible for a malicious client to request a lot of trees, causing upload-pack's memory to balloon. For example, without this patch, requesting every tree in git.git like: pktline() { local msg="$*" printf "%04x%s\n" $((1+4+${#msg})) "$msg" } want_trees() { pktline command=fetch printf 0001 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' \| while read oid type; do test "$type" = "tree" \|\| continue pktline want $oid done pktline done printf 0000 } want_trees \| GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null shows a peak heap usage of ~3.7GB. Which is just about the sum of the sizes of all of the uncompressed trees. For linux.git, it's closer to 17GB. So the obvious thing to do is to call free_tree_buffer() after we realize that we've parsed a tree. We know that upload-pack won't need it later. But let's push the logic into parse_object_with_flags(), telling it to discard the tree buffer immediately. There are two reasons for this. One, all of the relevant call-sites already call the with_options variant to pass the SKIP_HASH flag. So it actually ends up as less code than manually free-ing in each spot. And two, it enables an extra optimization that I'll discuss below. I've touched all of the sites that currently use SKIP_HASH in upload-pack. That drops the peak heap of the upload-pack invocation above from 3.7GB to ~24MB. I've also modified the caller in get_reference(); a partial clone benefits from its use in pack-objects for the reasons given in `0bc2557951` (upload-pack: skip parse-object re-hashing of "want" objects, 2022-09-06), where we were measuring blob requests. But note that the results of get_reference() are used for traversing, as well; so we really would _eventually_ use the tree contents. That makes this at first glance a space/time tradeoff: we won't hold all of the trees in memory at once, but we'll have to reload them each when it comes time to traverse. And here's where our extra optimization comes in. If the caller is not going to immediately look at the tree contents, and it doesn't care about checking the hash, then parse_object() can simply skip loading the tree entirely, just like we do for blobs! And now it's not a space/time tradeoff in get_reference() anymore. It's just a lazy-load: we're delaying reading the tree contents until it's time to actually traverse them one by one. And of course for upload-pack, this optimization means we never load the trees at all, saving lots of CPU time. Timing the "every tree from git.git" request above shows upload-pack dropping from 32 seconds of CPU to 19 (the remainder is mostly due to pack-objects actually sending the pack; timing just the upload-pack portion shows we go from 13s to ~0.28s). These are all highly gamed numbers, of course. For real-world partial-clone requests we're saving only a small bit of time in practice. But it does help harden upload-pack against malicious denial-of-service attacks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 14:42:01 -08:00
Jeff King	a6ca601cdf	upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places In commit `0bc2557951` (upload-pack: skip parse-object re-hashing of "want" objects, 2022-09-06), we optimized the parse_object() calls for v2 "want" lines from the client so that they avoided parsing blobs, and so that they used the commit-graph rather than parsing commit objects from scratch. We should extend that to two other spots: 1. We parse "have" objects in the got_oid() function. These won't generally be non-commits (unlike "want" lines from a partial clone). But we still benefit from the use of the commit-graph. 2. For v0, the "want" lines are parsed in receive_needs(). These are also less likely to be non-commits because by default they have to be ref tips. There are config options you might set to allow non-tip objects, but you'd mostly do so to support partial clones, and clients recent enough to support partial clone will generally speak v2 anyway. So I don't expect this change to improve performance much for day-to-day operations. But both are possible denial-of-service vectors, where an attacker can waste our time by sending over a large number of objects to parse (of course we may waste even more time serving a pack to them, but we try as much as possible to optimize that in pack-objects; we should do what we can here in upload-pack, too). With this patch, running p5600 with GIT_TEST_PROTOCOL_VERSION=0 shows similar results to what we saw in `0bc2557951` (which ran with the v2 protocol by default). Here are the numbers for linux.git: Test HEAD^ HEAD ----------------------------------------------------------------------------- 5600.3: checkout of result 50.91(87.95+2.93) 41.75(79.00+3.18) -18.0% Or for a more extreme (and malicious) case, we can claim to "have" every blob in git.git over the v0 protocol: $ { echo "0032want $(git rev-parse HEAD)" printf 0000 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' \| perl -alne 'print "0032have $F[0]" if $F[1] eq "blob"' } >input $ time ./git.old upload-pack . <input >/dev/null real 0m52.951s user 0m51.633s sys 0m1.304s $ time ./git.new upload-pack . <input >/dev/null real 0m0.261s user 0m0.156s sys 0m0.105s (Note that these don't actually compute a pack because of the hacky protocol usage, so those numbers are representing the raw blob-parsing effort done by upload-pack). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 14:42:01 -08:00
Jeff King	5f64279443	upload-pack: always turn off save_commit_buffer When the client sends us "want $oid" lines, we call parse_object($oid) to get an object struct. It's important to parse the commits because we need to traverse them in the negotiation phase. But of course we don't need to hold on to the commit messages for each one. We've turned off the save_commit_buffer flag in get_common_commits() for a long time, since `f0243f26f6` (git-upload-pack: More efficient usage of the has_sha1 array, 2005-10-28). That helps with the commits we see while actually traversing. But: 1. That function is only used by the v0 protocol. I think the v2 protocol's code path leaves the flag on (and thus pays the extra memory penalty), though I didn't measure it specifically. 2. If the client sends us a bunch of "want" lines, that happens before the negotiation phase. So we'll hold on to all of those commit messages. Generally the number of "want" lines scales with the refs, not with the number of objects in the repo. But a malicious client could send a lot in order to waste memory. As an example of (2), if I generate a request to fetch all commits in git.git like this: pktline() { local msg="$*" printf "%04x%s\n" $((1+4+${#msg})) "$msg" } want_commits() { pktline command=fetch printf 0001 git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' \| while read oid type; do test "$type" = "commit" \|\| continue pktline want $oid done pktline done printf 0000 } want_commits \| GIT_PROTOCOL=version=2 valgrind --tool=massif git-upload-pack . >/dev/null before this patch upload-pack peaks at ~125MB, and after at ~35MB. The difference is not coincidentally about the same as the sum of all commit object sizes as computed by: git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' \| perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }' In a larger repository like linux.git, that number is ~1GB. In a repository with a full commit-graph file this will have no impact (and the commit graph would save us from parsing at all, so is a much better solution!). But it's easy to do, might help a little in real-world cases (where even if you have a commit graph it might not be fully up to date), and helps a lot for a worst-case malicious request. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 14:42:01 -08:00
Jeff King	179776f9e6	upload-pack: accept only a single packfile-uri line When we see a packfile-uri line from the client, we use string_list_split() to split it on commas and store the result in a string_list. A single packfile-uri line is therefore limited to storing ~64kb, the size of a pkt-line. But we'll happily accept multiple such lines, and each line appends to the string list, growing without bound. In theory this could be useful, making: 0017packfile-uris http 0018packfile-uris https equivalent to: 001dpackfile-uris http,https But the protocol documentation doesn't indicate that this should work (and indeed, refers to this in the singular as "the following argument can be included in the client's request"). And the client-side implementation in fetch-pack has always sent a single line (JGit appears to understand the line on the server side but has no client-side implementation, and libgit2 understands neither). If we were worried about compatibility, we could instead just put a limit on the maximum number of values we'd accept. The current client implementation limits itself to only two values: "http" and "https", so something like "256" would be more than enough. But accepting only a single line seems more in line with the protocol documentation, and matches other parts of the protocol (e.g., we will not accept a second "filter" line). We'll also make this more explicit in the protocol documentation; as above, I think this was always the intent, but there's no harm in making it clear. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 14:42:01 -08:00
Jeff King	b065063c57	upload-pack: use a strmap for want-ref lines When the "ref-in-want" capability is advertised (which it is not by default), then upload-pack processes a "want-ref" line from the client by checking that the name is a valid ref and recording it in a string-list. In theory this list should grow no larger than the number of refs in the server-side repository. But since we don't do any de-duplication, a client which sends "want-ref refs/heads/foo" over and over will cause the array to grow without bound. We can fix this by switching to strmap, which efficiently detects duplicates. There are two client-visible changes here: 1. The "wanted-refs" response will now be in an apparently-random order (based on iterating the hashmap) rather than the order given by the client. The protocol documentation is quiet on ordering here. The current fetch-pack implementation is happy with any order, as it looks up each returned ref using a binary search in its local sorted list. JGit seems to implement want-ref on the server side, but has no client-side support. libgit2 doesn't support either side. It would obviously be possible to record the original order or to use the strmap as an auxiliary data structure. But if the client doesn't care, we may as well do the simplest thing. 2. We'll now reject duplicates explicitly as a protocol error. The client should never send them (and our current implementation, even when asked to "git fetch master:one master:two" will de-dup on the client side). If we wanted to be more forgiving, we could perhaps just throw away the duplicates. But then our "wanted-refs" response back to the client would omit the duplicates, and it's hard to say what a client that accidentally sent a duplicate would do with that. So I think we're better off to complain loudly before anybody accidentally writes such a client. Let's also add a note to the protocol documentation clarifying that duplicates are forbidden. As discussed above, this was already the intent, but it's not very explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 14:42:01 -08:00
Jeff King	388b96df31	upload-pack: use oidset for deepen_not list We record the oid of every deepen-not line the client sends to us. For a well-behaved client, the resulting array should be bounded by the number of unique refs we have. But because there's no de-duplication, a malicious client can cause the array to grow unbounded by just sending the same "refs/heads/foo" over and over (assuming such a ref exists). Since the deepen-not list is just being fed to a "rev-list --not" traversal, the order of items doesn't matter. So we can replace the oid_array with an oidset which notices and skips duplicates. That bounds the memory in malicious cases to be linear in the number of unique refs. And even in non-malicious cases, there may be a slight improvement in memory usage if multiple refs point to the same oid (though in practice this list is probably pretty tiny anyway, as it comes from the user specifying "--shallow-exclude" on the client fetch). Note that in the trace2 output we'll now output the number of de-duplicated objects, rather than the total number of "deepen-not" lines we received. This is arguably a more useful value for tracing / debugging anyway. Reported-by: Benjamin Flesch <benjaminflesch@icloud.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 14:42:01 -08:00
Jeff King	720ba25d99	upload-pack: switch deepen-not list to an oid_array When we see a "deepen-not" line from the client, we verify that the given name can be resolved as a ref, and then add it to a string list to be passed later to an internal "rev-list --not" traversal. We record the actual refname in the string list (so the traversal resolves it again later), but we'd be better off recording the resolved oid: 1. There's a tiny bit of wasted work in resolving it twice. 2. There's a small race condition with simultaneous updates; the later traversal may resolve to a different value (or not at all). This shouldn't cause any bad behavior (we do not care about the value in this first resolution, so whatever value rev-list gets is OK) but it could mean a confusing error message (if upload-pack fails to resolve the ref it produces a useful message, but a failing traversal later results in just "revision walk setup failed"). 3. It makes it simpler to de-duplicate the results. We don't de-dup at all right now, but we will in the next patch. >From the client's perspective the behavior should be the same. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 14:42:01 -08:00
Jeff King	fae9627470	upload-pack: drop separate v2 "haves" array When upload-pack sees a "have" line in the v0 protocol, it immediately calls got_oid() with its argument and potentially produces an ACK response. In the v2 protocol, we simply record the argument in an oid_array, and only later process all of the "have" objects by calling the equivalent of got_oid() on the contents of the array. This makes some sense, as v2 is a pure request/response protocol, as opposed to v0's asynchronous negotiation phase. But there's a downside: a client can send us an infinite number of garbage "have" lines, which we'll happily slurp into the array, consuming memory. Whereas in v0, they are limited by the number of objects in the repository (because got_oid() only records objects we have ourselves, and we avoid duplicates by setting a flag on the object struct). We can make v2 behave more like v0 by also calling got_oid() directly when v2 parses a "have" line. Calling it early like this is OK because got_oid() itself does not interact with the client; it only confirms that we have the object and sets a few flags. Note that unlike v0, v2 does not ever (before or after this patch) check the return code of got_oid(), which lets the caller know whether we have the object. But again, that makes sense; v0 is using it to asynchronously tell the client to stop sending. In v2's synchronous protocol, we just discard those entries (and decide how to ACK at the end of each round). There is one slight tweak we need, though. In v2's state machine, we reach the SEND_ACKS state if the other side sent us any "have" lines, whether they were useful or not. Right now we do that by checking whether the "have" array had any entries, but if we record only the useful ones, that doesn't work. Instead, we can add a simple boolean that tells us whether we saw any have line (even if it was useless). This lets us drop the "haves" array entirely, as we're now placing objects directly into the "have_obj" object array (which is where got_oid() put them in the long run anyway). And as a bonus, we can drop the secondary "common" array used in process_haves_and_send_acks(). It was essentially a copy of "haves" minus the objects we do not have. But now that we are using "have_obj" directly, we know everything in it is useful. So in addition to protecting ourselves against malicious input, we should slightly lower our memory usage for normal inputs. Note that there is one user-visible effect. The trace2 output records the number of "haves". Previously this was the total number of "have" lines we saw, but now is the number of useful ones. We could retain the original meaning by keeping a separate counter, but it doesn't seem worth the effort; this trace info is for debugging and metrics, and arguably the count of common oids is at least as useful as the total count. Reported-by: Benjamin Flesch <benjaminflesch@icloud.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-28 14:42:01 -08:00
SZEDER Gábor	3f4c7a0805	upload-pack: don't send null character in abort message to the client Since `583b7ea31b` (upload-pack/fetch-pack: support side-band communication, 2006-06-21) the abort message sent by upload-pack in case of possible repository corruption ends with a null character. This can be seen in several test cases in 't5530-upload-pack-error.sh' where 'grep <pattern> output.err' often reports "Binary file output.err matches" because of that null character. The reason for this is that the abort message is defined as a string literal, and we pass its size to the send function as sizeof(abort_msg), which also counts the terminating null character. Use strlen() instead to avoid sending that terminating null character. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-26 09:49:08 -08:00
Elijah Newren	eea0e59ffb	treewide: remove unnecessary includes in source files Each of these were checked with gcc -E -I. ${SOURCE_FILE} \| grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-12-26 12:04:31 -08:00
Robert Coup	b8f58c200c	upload-pack: add tracing for fetches Information on how users are accessing hosted repositories can be helpful to server operators. For example, being able to broadly differentiate between fetches and initial clones; the use of shallow repository features; or partial clone filters. `a29263c` (fetch-pack: add tracing for negotiation rounds, 2022-08-02) added some information on have counts to fetch-pack itself to help diagnose negotiation; but from a git-upload-pack (server) perspective, there's no means of accessing such information without using GIT_TRACE_PACKET to examine the protocol packets. Improve this by emitting a Trace2 JSON event from upload-pack with summary information on the contents of a fetch request. * haves, wants, and want-ref counts can help determine (broadly) between fetches and clones, and the use of single-branch, etc. * shallow clone depth, tip counts, and deepening options. * any partial clone filter type. Signed-off-by: Robert Coup <robert@coup.net.nz> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-30 21:43:21 +09:00
Junio C Hamano	84d79009d9	Merge branch 'ds/upload-pack-error-sequence-fix' Error message generation fix. * ds/upload-pack-error-sequence-fix: upload-pack: fix exit code when denying fetch of unreachable object ID upload-pack: fix race condition in error messages	2023-08-24 09:32:33 -07:00
Patrick Steinhardt	5f33a843de	upload-pack: fix exit code when denying fetch of unreachable object ID In `7ba7c52d76` (upload-pack: fix race condition in error messages, 2023-08-10), we have fixed a race in t5516-fetch-push.sh where sometimes error messages got intermingled. This was done by splitting up the call to `die()` such that we print the error message before writing to the remote side, followed by a call to `exit(1)` afterwards. This causes a subtle regression though as `die()` causes us to exit with exit code 128, whereas we now call `exit(1)`. It's not really clear whether we want to guarantee any specific error code in this case, and neither do we document anything like that. But on the other hand, it seems rather clear that this is an unintended side effect of the change given that this change in behaviour was not mentioned at all. Restore the status-quo by exiting with 128. The test in t5703 to ensure that "git fetch" fails by using test_must_fail, which does not care between exiting 1 and 128, so this changes will not affect any test. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-08-16 09:17:46 -07:00
Derrick Stolee	7ba7c52d76	upload-pack: fix race condition in error messages Test t5516-fetch-push.sh has a test 'deny fetch unreachable SHA1, allowtipsha1inwant=true' that checks stderr for a specific error string from the remote. In some build environments the error sent over the remote connection gets mingled with the error from the die() statement. Since both signals are being output to the same file descriptor (but from parent and child processes), the output we are matching with grep gets split. To reduce the risk of this failure, follow this process instead: 1. Write an error message to stderr. 2. Write an error message across the connection. 3. exit(1). This reorders the events so the error is written entirely before the client receives a message from the remote, removing the race condition. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-08-10 09:15:27 -07:00
Junio C Hamano	39fe402d67	Merge branch 'tb/refs-exclusion-and-packed-refs' Enumerating refs in the packed-refs file, while excluding refs that match certain patterns, has been optimized. * tb/refs-exclusion-and-packed-refs: ls-refs.c: avoid enumerating hidden refs where possible upload-pack.c: avoid enumerating hidden refs where possible builtin/receive-pack.c: avoid enumerating hidden references refs.h: implement `hidden_refs_to_excludes()` refs.h: let `for_each_namespaced_ref()` take excluded patterns revision.h: store hidden refs in a `strvec` refs/packed-backend.c: add trace2 counters for jump list refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) refs/packed-backend.c: refactor `find_reference_location()` refs: plumb `exclude_patterns` argument throughout builtin/for-each-ref.c: add `--exclude` option ref-filter.c: parameterize match functions over patterns ref-filter: add `ref_filter_clear()` ref-filter: clear reachable list pointers after freeing ref-filter.h: provide `REF_FILTER_INIT` refs.c: rename `ref_filter`	2023-07-21 13:47:26 -07:00
Junio C Hamano	ce481ac8b3	Merge branch 'cw/compat-util-header-cleanup' Further shuffling of declarations across header files to streamline file dependencies. * cw/compat-util-header-cleanup: git-compat-util: move alloc macros to git-compat-util.h treewide: remove unnecessary includes for wrapper.h kwset: move translation table from ctype sane-ctype.h: create header for sane-ctype macros git-compat-util: move wrapper.c funcs to its header git-compat-util: move strbuf.c funcs to its header	2023-07-17 11:30:42 -07:00
Taylor Blau	18b6b1b5c5	upload-pack.c: avoid enumerating hidden refs where possible In a similar fashion as a previous commit, teach `upload-pack` to avoid enumerating hidden references where possible. Note, however, that there are certain cases where cannot avoid enumerating even hidden references, in particular when either of: - `uploadpack.allowTipSHA1InWant`, or - `uploadpack.allowReachableSHA1InWant` are set, corresponding to `ALLOW_TIP_SHA1` and `ALLOW_REACHABLE_SHA1`, respectively. When either of these bits are set, upload-pack's `is_our_ref()` function needs to consider the `HIDDEN_REF` bit of the referent's object flags. So we must visit all references, including the hidden ones, in order to mark their referents with the `HIDDEN_REF` bit. When neither `ALLOW_TIP_SHA1` nor `ALLOW_REACHABLE_SHA1` are set, the `is_our_ref()` function considers only the `OUR_REF` bit, and not the `HIDDEN_REF` one. `OUR_REF` is applied via `mark_our_ref()`, and only to objects at the tips of non-hidden references, so we do not need to visit hidden references in this case. When neither of those bits are set, `upload-pack` can potentially avoid enumerating a large number of references. In the same example as a previous commit (linux.git with one hidden reference per commit, "refs/pull/N"): $ printf 0000 >in $ hyperfine --warmup=1 \ 'git -c transfer.hideRefs=refs/pull upload-pack . <in' \ 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' \ 'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' Benchmark 1: git -c transfer.hideRefs=refs/pull upload-pack . <in Time (mean ± σ): 406.9 ms ± 1.1 ms [User: 357.3 ms, System: 49.5 ms] Range (min … max): 405.7 ms … 409.2 ms 10 runs Benchmark 2: git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in Time (mean ± σ): 406.5 ms ± 1.3 ms [User: 356.5 ms, System: 49.9 ms] Range (min … max): 404.6 ms … 408.8 ms 10 runs Benchmark 3: git.compile -c transfer.hideRefs=refs/pull upload-pack . <in Time (mean ± σ): 4.7 ms ± 0.2 ms [User: 0.7 ms, System: 3.9 ms] Range (min … max): 4.3 ms … 6.1 ms 472 runs Summary 'git.compile -c transfer.hideRefs=refs/pull upload-pack . <in' ran 86.62 ± 4.33 times faster than 'git.compile -c transfer.hideRefs=refs/pull -c uploadpack.allowTipSHA1InWant upload-pack . <in' 86.70 ± 4.33 times faster than 'git -c transfer.hideRefs=refs/pull upload-pack . <in' As above, we must visit every reference when uploadPack.allowTipSHA1InWant is set. But when it is unset, we can visit far fewer references. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	e6bf24d39a	refs.h: let `for_each_namespaced_ref()` take excluded patterns A future commit will want to call `for_each_namespaced_ref()` with a list of excluded patterns. We could introduce a variant of that function, say, `for_each_namespaced_ref_exclude()` which takes the extra parameter, and reimplement the original function in terms of that. But all but one caller (in `http-backend.c`) will supply the new parameter, so add the new parameter to `for_each_namespaced_ref()` itself instead of introducing a new function. For now, supply NULL for the list of excluded patterns at all callers to avoid changing behavior, which we will do in a future change. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	c45841fff8	revision.h: store hidden refs in a `strvec` In subsequent commits, it will be convenient to have a 'const char **' of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`, etc.), instead of a `string_list`. Convert spots throughout the tree that store the list of hidden refs from a `string_list` to a `strvec`. Note that in `parse_hide_refs_config()` there is an ugly const-cast used to avoid an extra copy of each value before trimming any trailing slash characters. This could instead be written as: ref = xstrdup(value); len = strlen(ref); while (len && ref[len - 1] == '/') ref[--len] = '\0'; strvec_push(hide_refs, ref); free(ref); but the double-copy (once when calling `xstrdup()`, and another via `strvec_push()`) is wasteful. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Junio C Hamano	b3d1c85d48	Merge branch 'gc/config-context' Reduce reliance on a global state in the config reading API. * gc/config-context: config: pass source to config_parser_event_fn_t config: add kvi.path, use it to evaluate includes config.c: remove config_reader from configsets config: pass kvi to die_bad_number() trace2: plumb config kvi config.c: pass ctx with CLI config config: pass ctx with config files config.c: pass ctx in configsets config: add ctx arg to config_fn_t urlmatch.h: use config_fn_t type config: inline git_color_default_config	2023-07-06 11:54:48 -07:00
Calvin Wan	da9502ff4d	treewide: remove unnecessary includes for wrapper.h Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-05 11:41:59 -07:00
Glen Choo	8868b1ebfb	config: pass kvi to die_bad_number() Plumb "struct key_value_info" through all code paths that end in die_bad_number(), which lets us remove the helper functions that read analogous values from "struct config_reader". As a result, nothing reads config_reader.config_kvi any more, so remove that too. In config.c, this requires changing the signature of git_configset_get_value() to 'return' "kvi" in an out parameter so that git_configset_get_<type>() can pass it to git_config_<type>(). Only numeric types will use "kvi", so for non-numeric types (e.g. git_configset_get_string()), pass NULL to indicate that the out parameter isn't needed. Outside of config.c, config callbacks now need to pass "ctx->kvi" to any of the git_config_<type>() functions that parse a config string into a number type. Included is a .cocci patch to make that refactor. The only exceptional case is builtin/config.c, where git_config_<type>() is called outside of a config callback (namely, on user-provided input), so config source information has never been available. In this case, die_bad_number() defaults to a generic, but perfectly descriptive message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure not to change the message. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-28 14:06:40 -07:00
Glen Choo	a4e7e317f8	config: add ctx arg to config_fn_t Add a new "const struct config_context ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-28 14:06:39 -07:00
Elijah Newren	a034e9106f	object-store-ll.h: split this header out of object-store.h The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store \| sort \| uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:54 -07:00
brian m. carlson	933e3a4ee2	upload-pack: advertise capabilities when cloning empty repos When cloning an empty repository, protocol versions 0 and 1 currently offer nothing but the header and flush packets for the /info/refs endpoint. This means that no capabilities are provided, so the client side doesn't know what capabilities are present. However, this does pose a problem when working with SHA-256 repositories, since we use the capabilities to know the remote side's object format (hash algorithm). As of `8b214c2e9d` ("clone: propagate object-format when cloning from void", 2023-04-05), this has been fixed for protocol v2, since there we always read the hash algorithm from the remote. Fortunately, the push version of the protocol already indicates a clue for how to solve this. When the /info/refs endpoint is accessed for a push and the remote is empty, we include a dummy "capabilities^{}" ref pointing to the all-zeros object ID. The protocol documentation already indicates this should _always_ be sent, even for fetches and clones, so let's just do that, which means we'll properly announce the hash algorithm as part of the capabilities. This just works with the existing code because we share the same ref code for fetches and clones, and libgit2, JGit, and dulwich do as well. There is one minor issue to fix, though. If we called send_ref with namespaces, we would return NULL with the capabilities entry, which would cause a crash. Instead, let's refactor out a function to print just the ref itself without stripping the namespace and use it for our special capabilities entry. Add several sets of tests for HTTP as well as for local clones. The behavior can be slightly different for HTTP versus a local or SSH clone because of the stateless-rpc functionality, so it's worth testing both. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-05-17 13:22:46 -07:00
Junio C Hamano	80d268f309	Merge branch 'jk/protocol-cap-parse-fix' The code to parse capability list for v0 on-wire protocol fell into an infinite loop when a capability appears multiple times, which has been corrected. * jk/protocol-cap-parse-fix: v0 protocol: use size_t for capability length/offset t5512: test "ls-remote --heads --symref" filtering with v0 and v2 t5512: allow any protocol version for filtered symref test t5512: add v2 support for "ls-remote --symref" test v0 protocol: fix sha1/sha256 confusion for capabilities^{} t5512: stop referring to "v1" protocol v0 protocol: fix infinite loop when parsing multi-valued capabilities	2023-04-25 13:56:20 -07:00
Jeff King	7ce4c8f752	v0 protocol: use size_t for capability length/offset When parsing server capabilities, we use "int" to store lengths and offsets. At first glance this seems like a spot where our parser may be confused by integer overflow if somebody sent us a malicious response. In practice these strings are all bounded by the 64k limit of a pkt-line, so using "int" is OK. However, it makes the code simpler to audit if they just use size_t everywhere. Note that because we take these parameters as pointers, this also forces many callers to update their declared types. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 15:08:13 -07:00
Elijah Newren	65156bb7ec	treewide: remove double forward declaration of read_in_full cache.h's nature of a dumping ground of includes prevented it from being included in some compat/ files, forcing us into a workaround of having a double forward declaration of the read_in_full() function (see commit `14086b0a13` ("compat/pread.c: Add a forward declaration to fix a warning", 2007-11-17)). Now that we have moved functions like read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered with unrelated and scary #defines, get rid of the extra forward declaration and just have compat/pread.c include wrapper.h. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:11 -07:00
Elijah Newren	5579f44d2f	treewide: remove unnecessary cache.h inclusion Several files were including cache.h solely to get other headers, such as trace.h and trace2.h. Since the last few commits have modified files to make these dependencies more explicit, the inclusion of cache.h is no longer needed in several cases. Remove it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:09 -07:00
Elijah Newren	6f2d743043	treewide: be explicit about dependence on oid-array.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:09 -07:00
Elijah Newren	74ea5c9574	treewide: be explicit about dependence on trace.h & trace2.h Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:08 -07:00
Junio C Hamano	6047b28eb7	Merge branch 'en/header-split-cleanup' Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers	2023-04-06 13:38:31 -07:00
Junio C Hamano	72871b198f	Merge branch 'ab/remove-implicit-use-of-the-repository' Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-06 13:38:30 -07:00
Junio C Hamano	e7dca80692	Merge branch 'ab/remove-implicit-use-of-the-repository' into en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-04 08:25:52 -07:00
Ævar Arnfjörð Bjarmason	bc726bd075	cocci: apply the "object-store.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "object-store.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:45 -07:00
Elijah Newren	d48be35ca6	write-or-die.h: move declarations for write-or-die.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:54 -07:00
Elijah Newren	32a8f51061	environment.h: move declarations for environment.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	f394e093df	treewide: be explicit about dependence on gettext.h Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:51 -07:00
Junio C Hamano	d0732a8120	Merge branch 'jk/unused-post-2.39-part2' More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...	2023-03-17 14:03:09 -07:00
Jeff King	74595cca21	serve: mark unused parameters in virtual functions Each v2 "serve" action has a virtual function for advertising and implementing the command. A few of these are so trivial that they don't need to look at their parameters, especially the "repository" parameter. We can mark them so that -Wunused-parameter doesn't complain. Note that upload_pack_v2() probably _should_ be using its repository pointer. But teaching the functions it calls to do so is non-trivial. Even using it for something as simple as reading config is tricky, both because it shares code with the v1 upload pack, and because the git_protected_config() mechanism it uses does not have a repo-specific interface. So we'll just annotate it for now, and cleaning it up can be part of the larger work to drop references to the_repository. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-24 09:13:29 -08:00
Jeff King	4b4e75dd4f	serve: use repository pointer to get config A few of the v2 "serve" callbacks ignore their repository parameter and read config using the_repository (either directly or implicitly by calling wrapper functions). This isn't a bug since the server code only handles a single main repository anyway (and indeed, if you look at the callers, these repository parameters will always be the_repository). But in the long run we want to get rid of the_repository, so let's take a tiny step in that direction. As a bonus, this silences some -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-24 09:13:29 -08:00
Elijah Newren	41771fa435	cache.h: remove dependence on hex.h; make other files include it explicitly Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:29 -08:00
Patrick Steinhardt	9b67eb6fbe	refs: get rid of global list of hidden refs We're about to add a new argument to git-rev-list(1) that allows it to add all references that are visible when taking `transfer.hideRefs` et al into account. This will require us to potentially parse multiple sets of hidden refs, which is not easily possible right now as there is only a single, global instance of the list of parsed hidden refs. Refactor `parse_hide_refs_config()` and `ref_is_hidden()` so that both take the list of hidden references as input and adjust callers to keep a local list, instead. This allows us to easily use multiple hidden-ref lists. Furthermore, it allows us to properly free this list before we exit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-17 16:22:51 -05:00
Junio C Hamano	298a958224	Merge branch 'jk/list-objects-filter-cleanup' A couple of bugfixes with code clean-up. * jk/list-objects-filter-cleanup: list-objects-filter: convert filter_spec to a strbuf list-objects-filter: add and use initializers list-objects-filter: handle null default filter spec list-objects-filter: don't memset after releasing filter struct	2022-09-19 14:35:24 -07:00

1 2 3 4 5 ...

505 commits