development/git - HydraGit

mirror of https://github.com/git/git synced 2024-11-05 18:59:29 +00:00

Author	SHA1	Message	Date
Junio C Hamano	39fe402d67	Merge branch 'tb/refs-exclusion-and-packed-refs' Enumerating refs in the packed-refs file, while excluding refs that match certain patterns, has been optimized. * tb/refs-exclusion-and-packed-refs: ls-refs.c: avoid enumerating hidden refs where possible upload-pack.c: avoid enumerating hidden refs where possible builtin/receive-pack.c: avoid enumerating hidden references refs.h: implement `hidden_refs_to_excludes()` refs.h: let `for_each_namespaced_ref()` take excluded patterns revision.h: store hidden refs in a `strvec` refs/packed-backend.c: add trace2 counters for jump list refs/packed-backend.c: implement jump lists to avoid excluded pattern(s) refs/packed-backend.c: refactor `find_reference_location()` refs: plumb `exclude_patterns` argument throughout builtin/for-each-ref.c: add `--exclude` option ref-filter.c: parameterize match functions over patterns ref-filter: add `ref_filter_clear()` ref-filter: clear reachable list pointers after freeing ref-filter.h: provide `REF_FILTER_INIT` refs.c: rename `ref_filter`	2023-07-21 13:47:26 -07:00
Taylor Blau	98456eff08	ls-refs.c: avoid enumerating hidden refs where possible In a similar fashion as in previous commits, teach `ls-refs` to avoid enumerating hidden references where possible. As before, this is linux.git with one hidden reference per commit. $ hyperfine -L v ,.compile 'git{v} -c protocol.version=2 ls-remote .' Benchmark 1: git -c protocol.version=2 ls-remote . Time (mean ± σ): 89.8 ms ± 0.6 ms [User: 84.3 ms, System: 5.7 ms] Range (min … max): 88.8 ms … 91.3 ms 32 runs Benchmark 2: git.compile -c protocol.version=2 ls-remote . Time (mean ± σ): 6.5 ms ± 0.1 ms [User: 2.4 ms, System: 4.3 ms] Range (min … max): 6.2 ms … 8.3 ms 397 runs Summary 'git.compile -c protocol.version=2 ls-remote .' ran 13.85 ± 0.33 times faster than 'git -c protocol.version=2 ls-remote .' Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	c45841fff8	revision.h: store hidden refs in a `strvec` In subsequent commits, it will be convenient to have a 'const char **' of hidden refs (matching `transfer.hiderefs`, `uploadpack.hideRefs`, etc.), instead of a `string_list`. Convert spots throughout the tree that store the list of hidden refs from a `string_list` to a `strvec`. Note that in `parse_hide_refs_config()` there is an ugly const-cast used to avoid an extra copy of each value before trimming any trailing slash characters. This could instead be written as: ref = xstrdup(value); len = strlen(ref); while (len && ref[len - 1] == '/') ref[--len] = '\0'; strvec_push(hide_refs, ref); free(ref); but the double-copy (once when calling `xstrdup()`, and another via `strvec_push()`) is wasteful. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:56 -07:00
Taylor Blau	b269ac53c0	refs: plumb `exclude_patterns` argument throughout The subsequent patch will want to access an optional `excluded_patterns` array within `refs/packed-backend.c` that will cull out certain references matching any of the given patterns on a best-effort basis. To do so, the refs subsystem needs to be updated to pass this value across a number of different locations. Prepare for a future patch by introducing this plumbing now, passing NULLs at top-level APIs in order to make that patch less noisy and more easily readable. Signed-off-by: Taylor Blau <me@ttaylorr.co> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-10 14:48:55 -07:00
Glen Choo	a4e7e317f8	config: add ctx arg to config_fn_t Add a new "const struct config_context ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-28 14:06:39 -07:00
Elijah Newren	d1cbe1e6d8	hash-ll.h: split out of hash.h to remove dependency on repository.h hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-24 12:47:32 -07:00
Elijah Newren	9875058870	treewide: remove cache.h inclusion due to environment.h changes Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:54 -07:00
Elijah Newren	32a8f51061	environment.h: move declarations for environment.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	f394e093df	treewide: be explicit about dependence on gettext.h Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:51 -07:00
Junio C Hamano	d0732a8120	Merge branch 'jk/unused-post-2.39-part2' More work towards -Wunused. * jk/unused-post-2.39-part2: (21 commits) help: mark unused parameter in git_unknown_cmd_config() run_processes_parallel: mark unused callback parameters userformat_want_item(): mark unused parameter for_each_commit_graft(): mark unused callback parameter rewrite_parents(): mark unused callback parameter fetch-pack: mark unused parameter in callback function notes: mark unused callback parameters prio-queue: mark unused parameters in comparison functions for_each_object: mark unused callback parameters list-objects: mark unused callback parameters mark unused parameters in signal handlers run-command: mark error routine parameters as unused mark "pointless" data pointers in callbacks ref-filter: mark unused callback parameters http-backend: mark unused parameters in virtual functions http-backend: mark argc/argv unused object-name: mark unused parameters in disambiguate callbacks serve: mark unused parameters in virtual functions serve: use repository pointer to get config ls-refs: drop config caching ...	2023-03-17 14:03:09 -07:00
Jeff King	4b4e75dd4f	serve: use repository pointer to get config A few of the v2 "serve" callbacks ignore their repository parameter and read config using the_repository (either directly or implicitly by calling wrapper functions). This isn't a bug since the server code only handles a single main repository anyway (and indeed, if you look at the callers, these repository parameters will always be the_repository). But in the long run we want to get rid of the_repository, so let's take a tiny step in that direction. As a bonus, this silences some -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-24 09:13:29 -08:00
Jeff King	c4716086d8	ls-refs: drop config caching The code for the v2 ls-refs command has an ensure_config_read() function that tries to read the lsrefs.unborn config only once and caches it in some static global variables. There's no real need for this caching. In any given process we'd only need the value twice (once to decide whether to advertise, and once if somebody runs the command). And since the config code already has its own cache, each access is only incurring a hash lookup and string comparison anyway. Since the values we set are going to be specific to the_repository, the globals we set are a mild anti-pattern. In practice it's not a bug (yet) since the server-side v2 code only handles a single repository anyway. But it doesn't hurt to take a small step in the right direction and model a good approach. Note that we currently set two booleans: advertise_unborn and allow_unborn. But we can get away with a single value, since "advertise" naturally implies "allow". That lets us just convert this to a function with a return value. Note that we still always read from the_repository; we'll deal with that in a follow-on patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-24 09:13:28 -08:00
Elijah Newren	41771fa435	cache.h: remove dependence on hex.h; make other files include it explicitly Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:29 -08:00
Jeff King	91e2ab1587	ls-refs: use repository parameter to iterate refs The ls_refs() function (for the v2 protocol command of the same name) takes a repository parameter (like all v2 commands), but ignores it. It should use it to access the refs. This isn't a bug in practice, since we only call this function when serving upload-pack from the main repository. But it's an awkward gotcha, and it causes -Wunused-parameter to complain. The main reason we don't use the repository parameter is that the ref iteration interface we call doesn't have a "refs_" variant that takes a ref_store. However we can easily add one. In fact, since there is only one other caller (in ref-filter.c), there is no need to maintain the non-repository wrapper; that caller can just use the_repository. It's still a long way from consistently using a repository object, but it's one small step in the right direction. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-12-13 22:16:22 +09:00
Patrick Steinhardt	9b67eb6fbe	refs: get rid of global list of hidden refs We're about to add a new argument to git-rev-list(1) that allows it to add all references that are visible when taking `transfer.hideRefs` et al into account. This will require us to potentially parse multiple sets of hidden refs, which is not easily possible right now as there is only a single, global instance of the list of parsed hidden refs. Refactor `parse_hide_refs_config()` and `ref_is_hidden()` so that both take the list of hidden references as input and adjust callers to keep a local list, instead. This allows us to easily use multiple hidden-ref lists. Furthermore, it allows us to properly free this list before we exit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-17 16:22:51 -05:00
Ævar Arnfjörð Bjarmason	5cf88fd8b0	git-compat-util.h: use "UNUSED", not "UNUSED(var)" As reported in [1] the "UNUSED(var)" macro introduced in `2174b8c75d` (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of `9b24034754` (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-01 10:49:48 -07:00
Jeff King	783a86c142	config: mark unused callback parameters The callback passed to git_config() must conform to a particular interface. But most callbacks don't actually look at the extra "void *data" parameter. Let's mark the unused parameters to make -Wunused-parameter happy. Note there's one unusual case here in get_remote_default() where we actually ignore the "value" parameter. That's because it's only checking whether the option is found at all, and not parsing its value. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:55 -07:00
Jean-Noël Avila	1a8aea857e	i18n: factorize "invalid value" messages Use the same message when an invalid value is passed to a command line option or a configuration variable. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-04 13:58:28 -08:00
Junio C Hamano	f6c075ad71	Merge branch 'jk/ref-paranoia' The ref iteration code used to optionally allow dangling refs to be shown, which has been tightened up. * jk/ref-paranoia: refs: drop "broken" flag from for_each_fullref_in() ref-filter: drop broken-ref code entirely ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN repack, prune: drop GIT_REF_PARANOIA settings refs: turn on GIT_REF_PARANOIA by default refs: omit dangling symrefs when using GIT_REF_PARANOIA refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag refs-internal.h: reorganize DO_FOR_EACH_* flag documentation refs-internal.h: move DO_FOR_EACH_* flags next to each other t5312: be more assertive about command failure t5312: test non-destructive repack t5312: create bogus ref as necessary t5312: drop "verbose" helper t5600: provide detached HEAD for corruption failures t5516: don't use HEAD ref for invalid ref-deletion tests t7900: clean up some more broken refs	2021-10-11 10:21:47 -07:00
Junio C Hamano	bb1677fc29	Merge branch 'jk/reduce-malloc-in-v2-servers' Code cleanup to limit memory consumption and tighten protocol message parsing. * jk/reduce-malloc-in-v2-servers: ls-refs: reject unknown arguments serve: reject commands used as capabilities serve: reject bogus v2 "command=ls-refs=foo" docs/protocol-v2: clarify some ls-refs ref-prefix details ls-refs: ignore very long ref-prefix counts serve: drop "keys" strvec serve: provide "receive" function for session-id capability serve: provide "receive" function for object-format capability serve: add "receive" method for v2 capabilities table serve: return capability "value" from get_capability() serve: rename is_command() to parse_command()	2021-09-28 13:06:53 -07:00
Jeff King	67985e4e4a	refs: drop "broken" flag from for_each_fullref_in() No callers pass in anything but "0" here. Likewise to our sibling functions. Note that some of them ferry along the flag, but none of their callers pass anything but "0" either. Nor is anybody likely to change that. Callers which really want to see all of the raw refs use for_each_rawref(). And anybody interested in iterating a subset of the refs will likely be happy to use the now-default behavior of showing broken refs, but omitting dangling symlinks. So we can get rid of this whole feature. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-27 12:36:45 -07:00
Junio C Hamano	5331af2352	Merge branch 'ab/serve-cleanup' Code clean-up around "git serve". * ab/serve-cleanup: upload-pack: document and rename --advertise-refs serve.[ch]: remove "serve_options", split up --advertise-refs code {upload,receive}-pack tests: add --advertise-refs tests serve.c: move version line to advertise_capabilities() serve: move transfer.advertiseSID check into session_id_advertise() serve.[ch]: don't pass "struct strvec *keys" to commands serve: use designated initializers transport: use designated initializers transport: rename "fetch" in transport_vtable to "fetch_refs" serve: mark has_capability() as static	2021-09-20 15:20:43 -07:00
Junio C Hamano	c2509c5407	Merge branch 'jv/pkt-line-batch' Reduce number of write(2) system calls while sending the ref advertisement. * jv/pkt-line-batch: upload-pack: use stdio in send_ref callbacks pkt-line: add stdio packet write functions	2021-09-20 15:20:41 -07:00
Jeff King	ccf094788c	ls-refs: reject unknown arguments The v2 ls-refs command may receive extra arguments from the client, one per pkt-line. The spec is pretty clear that the arguments must come from a specified set, but we silently ignore any unknown entries. For a well-behaved client this doesn't matter, but it makes testing and debugging more confusing. Let's tighten this up to match the spec. In theory this liberal behavior _could_ be useful for extending the protocol. But: - every other part of the protocol requires that the server first indicate that it supports the argument; this includes the fetch and object-info commands, plus the "unborn" capability added to ls-refs itself - it's not a very good extension mechanism anyway; without the server advertising support, clients would have no idea if the argument was silently ignored, or accepted and simply had no effect So we're not really losing anything by tightening this. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-15 12:25:19 -07:00
Jeff King	7f0e4f6ac2	ls-refs: ignore very long ref-prefix counts Because each "ref-prefix" capability from the client comes in its own pkt-line, there's no limit to the number of them that a misbehaving client may send. We read them all into a strvec, which means the client can waste arbitrary amounts of our memory by just sending us "ref-prefix foo" over and over. One possible solution is to just drop the connection when the limit is reached. If we set it high enough, then only misbehaving or malicious clients would hit it. But "high enough" is vague, and it's unfriendly if we guess wrong and a legitimate client hits this. But we can do better. Since supporting the ref-prefix capability is optional anyway, the client has to further cull the response based on their own patterns. So we can simply ignore the patterns once we cross a certain threshold. Note that we have to ignore _all_ patterns, not just the ones past our limit (since otherwise we'd send too little data). The limit here is fairly arbitrary, and probably much higher than anyone would need in practice. It might be worth limiting it further, if only because we check it linearly (so with "m" local refs and "n" patterns, we do "m * n" string comparisons). But if we care about optimizing this, an even better solution may be a more advanced data structure anyway. I didn't bother making the limit configurable, since it's so high and since Git should behave correctly in either case. It wouldn't be too hard to do, but it makes both the code and documentation more complex. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-15 12:25:19 -07:00
Junio C Hamano	0057847208	Merge branch 'ab/serve-cleanup' into jk/reduce-malloc-in-v2-servers * ab/serve-cleanup: upload-pack: document and rename --advertise-refs serve.[ch]: remove "serve_options", split up --advertise-refs code {upload,receive}-pack tests: add --advertise-refs tests serve.c: move version line to advertise_capabilities() serve: move transfer.advertiseSID check into session_id_advertise() serve.[ch]: don't pass "struct strvec *keys" to commands serve: use designated initializers transport: use designated initializers transport: rename "fetch" in transport_vtable to "fetch_refs" serve: mark has_capability() as static	2021-09-14 10:56:05 -07:00
Jacob Vosmaer	70afef5cdf	upload-pack: use stdio in send_ref callbacks In both protocol v0 and v2, upload-pack writes one pktline packet per advertised ref to stdout. That means one or two write(2) syscalls per ref. This is problematic if these writes become network sends with high overhead. This commit changes both send_ref callbacks to use buffered IO using stdio. To give an example of the impact: I set up a single-threaded loop that calls ls-remote (with HTTP and protocol v2) on a local GitLab instance, on a repository with 11K refs. When I switch from Git v2.32.0 to this patch, I see a 40% reduction in CPU time for Git, and 65% for Gitaly (GitLab's Git RPC service). So using buffered IO not only saves syscalls in upload-pack, it also saves time in things that consume upload-pack's output. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-01 10:20:39 -07:00
Patrick Steinhardt	f54b9f21ca	ls-refs: reuse buffer when sending refs In the initial reference advertisement, the Git server will first announce all of its references to the client. The logic is handled in `send_ref()`, which will allocate a new buffer for each refline it is about to send. This is quite wasteful: instead of allocating a new buffer each time, we can just reuse a buffer. Improve this by passing in a buffer via the `ls_refs_data` struct which is then reused on each reference. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-25 15:55:29 -07:00
Ævar Arnfjörð Bjarmason	28a592e4f4	serve.[ch]: don't pass "struct strvec *keys" to commands The serve.c API added in `ed10cb952d` (serve: introduce git-serve, 2018-03-15) was passing in the raw capabilities "keys", but nothing downstream of it ever used them. Let's remove that code because it's not needed. If we do end up needing to pass information about the advertisement in the future it'll make more sense to have serve.c parse the capabilities keys and pass the result of its parsing, rather than expecting expecting its API users to parse the same keys again. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-05 08:59:37 -07:00
Junio C Hamano	69571dfe21	Merge branch 'jt/clone-unborn-head' "git clone" tries to locally check out the branch pointed at by HEAD of the remote repository after it is done, but the protocol did not convey the information necessary to do so when copying an empty repository. The protocol v2 learned how to do so. * jt/clone-unborn-head: clone: respect remote unborn HEAD connect, transport: encapsulate arg in struct ls-refs: report unborn targets of symrefs	2021-02-17 17:21:40 -08:00
Junio C Hamano	6254fa1359	Merge branch 'tb/ls-refs-optim' The ls-refs protocol operation has been optimized to narrow the sub-hierarchy of refs/ it walks to produce response. * tb/ls-refs-optim: ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets ls-refs.c: initialize 'prefixes' before using it refs: expose 'for_each_fullref_in_prefixes'	2021-02-05 16:40:45 -08:00
Jonathan Tan	59e1205d16	ls-refs: report unborn targets of symrefs When cloning, we choose the default branch based on the remote HEAD. But if there is no remote HEAD reported (which could happen if the target of the remote HEAD is unborn), we'll fall back to using our local init.defaultBranch. Traditionally this hasn't been a big deal, because most repos used "master" as the default. But these days it is likely to cause confusion if the server and client implementations choose different values (e.g., if the remote started with "main", we may choose "master" locally, create commits there, and then the user is surprised when they push to "master" and not "main"). To solve this, the remote needs to communicate the target of the HEAD symref, even if it is unborn, and "git clone" needs to use this information. Currently, symrefs that have unborn targets (such as in this case) are not communicated by the protocol. Teach Git to advertise and support the "unborn" feature in "ls-refs" (by default, this is advertised, but server administrators may turn this off through the lsrefs.unborn config). This feature indicates that "ls-refs" supports the "unborn" argument; when it is specified, "ls-refs" will send the HEAD symref with the name of its unborn target. This change is only for protocol v2. A similar change for protocol v0 would require independent protocol design (there being no analogous position to signal support for "unborn") and client-side plumbing of the data required, so the scope of this patch set is limited to protocol v2. The client side will be updated to use this in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 13:49:53 -08:00
Taylor Blau	b3970c702c	ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets ls-refs performs a single revision walk over the whole ref namespace, and sends ones that match with one of the given ref prefixes down to the user. This can be expensive if there are many refs overall, but the portion of them covered by the given prefixes is small by comparison. To attempt to reduce the difference between the number of refs traversed, and the number of refs sent, only traverse references which are in the longest common prefix of the given prefixes. This is very reminiscent of the approach taken in `b31e2680c4` (ref-filter.c: find disjoint pattern prefixes, 2019-06-26) which does an analogous thing for multi-patterned 'git for-each-ref' invocations. The callback 'send_ref' is resilient to ignore extra patterns by discarding any arguments which do not begin with at least one of the specified prefixes. Similarly, the code introduced in `b31e2680c4` is resilient to stop early at metacharacters, but we only pass strict prefixes here. At worst we would return too many results, but the double checking done by send_ref will throw away anything that doesn't start with something in the prefix list. Finally, if no prefixes were provided, then implicitly add the empty string (which will match all references) since this matches the existing behavior (see the "no restrictions" comment in "ls-refs.c:ref_match()"). Original-patch-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 18:57:27 -08:00
Jacob Vosmaer	83befd3724	ls-refs.c: initialize 'prefixes' before using it Correctly initialize the "prefixes" strvec using strvec_init() instead of simply zeroing it via the earlier memset(). There's no way to trigger a crash, since the first 'ref-prefix' command will initialize the strvec via the 'ALLOC_GROW' in 'strvec_push_nodup()' (the alloc and nr variables are already zero'd, so the call to ALLOC_GROW is valid). If no "ref-prefix" command was given, then the call to 'ls-refs.c:ref_match()' will abort early after it reads the zero in 'prefixes->nr'. Likewise, strvec_clear() will only call free() on the array, which is NULL, so we're safe there, too. But, all of this is dangerous and requires more reasoning than it would if we simply called 'strvec_init()', so do that. Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 18:57:27 -08:00
Jeff King	36a317929b	refs: switch peel_ref() to peel_iterated_oid() The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char refname, const struct object_id oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:51:31 -08:00
Jeff King	d70a9eb611	strvec: rename struct fields The "argc" and "argv" names made sense when the struct was argv_array, but now they're just confusing. Let's rename them to "nr" (which we use for counts elsewhere) and "v" (which is rather terse, but reads well when combined with typical variable names like "args.v"). Note that we have to update all of the callers immediately. Playing tricks with the preprocessor is hard here, because we wouldn't want to rewrite unrelated tokens. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-30 19:18:06 -07:00
Jeff King	ef8d7ac42a	strvec: convert more callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts remaining files from the first half of the alphabet, to keep the diff to a manageable size. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' and then selectively staging files with "git add '[abcdefghjkl]*'". We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:18 -07:00
Jeff King	dbbcd44fb4	strvec: rename files from argv-array to strvec This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '.c' '.h' \| xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:17 -07:00
Jeff King	4845b77245	upload-pack: handle unexpected delim packets When processing the arguments list for a v2 ls-refs or fetch command, we loop like this: while (packet_reader_read(request) != PACKET_READ_FLUSH) { const char *arg = request->line; ...handle arg... } to read and handle packets until we see a flush. The hidden assumption here is that anything except PACKET_READ_FLUSH will give us valid packet data to read. But that's not true; PACKET_READ_DELIM or PACKET_READ_EOF will leave packet->line as NULL, and we'll segfault trying to look at it. Instead, we should follow the more careful model demonstrated on the client side (e.g., in process_capabilities_v2): keep looping as long as we get normal packets, and then make sure that we broke out of the loop due to a real flush. That fixes the segfault and correctly diagnoses any unexpected input from the client. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-27 12:18:48 -07:00
Jeff King	533e088250	upload-pack: strip namespace from symref data Since `7171d8c15f` (upload-pack: send symbolic ref information as capability, 2013-09-17), we've sent cloning and fetching clients special information about which branch HEAD is pointing to, so that they don't have to guess based on matching up commit ids. However, this feature has never worked properly with the GIT_NAMESPACE feature. Because upload-pack uses head_ref_namespaced(find_symref), we do find and report on refs/namespaces/foo/HEAD instead of the actual HEAD of the repo. This makes sense, since the branch pointed to by the top-level HEAD may not be advertised at all. But we do two things wrong: 1. We report the full name refs/namespaces/foo/HEAD, instead of just HEAD. Meaning no client is going to bother doing anything with that symref, since we're not otherwise advertising it. 2. We report the symref destination using its full name (e.g., refs/namespaces/foo/refs/heads/master). That's similarly useless to the client, who only saw "refs/heads/master" in the advertisement. We should be stripping the namespace prefix off of both places (which this patch fixes). Likely nobody noticed because we tend to do the right thing anyway. Bug (1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD). And so the client half of the code, from `a45b5f0552` (connect: annotate refs with their symref information in get_remote_head(), 2013-09-17), does not annotate HEAD, and we use the fallback in guess_remote_head(), matching refs by object id. Which is usually right. It only falls down in ambiguous cases, like the one laid out in the included test. This also means that we don't have to worry about breaking anybody who was putting pre-stripped names into their namespace symrefs when we fix bug (2). Because of bug (1), nobody would have been using the symref we advertised in the first place (not to mention that those symrefs would have appeared broken for any non-namespaced access). Note that we have separate fixes here for the v0 and v2 protocols. The symref advertisement moved in v2 to be a part of the ls-refs command. This actually gets part (1) right, since the symref annotation piggy-backs on the existing ref advertisement, which is properly stripped. But it still needs a fix for part (2). The included tests cover both protocols. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-28 10:02:00 -07:00
Junio C Hamano	9c96ab9872	Merge branch 'jt/namespaced-ls-refs-fix' Fix namespace support in protocol v2. * jt/namespaced-ls-refs-fix: ls-refs: filter refs using namespace-stripped name	2019-02-05 14:26:15 -08:00
Jonathan Tan	e2f41a0a5a	ls-refs: filter refs using namespace-stripped name If a user fetches refs/heads/master from a repo with namespace "ns", the remote is expected to (1) not send the real refs/heads/master, and (2) send refs/namespaces/ns/refs/heads/master with the name refs/heads/master. (1) indeed happens now, but not (2) - Git only sends refs that have the user-given prefix, but it checks them against the full name of the ref (the one starting with refs/namespaces), and not the namespace-stripped one. This is demonstrated by the patch in the test. Currently, it results in "fatal: couldn't find remote ref refs/heads/master" despite both unnamespaced and namespaced master being present. With the code change, it produces the expected result. Check the ref prefixes against the namespace-stripped name. This bug was discovered through applying patches [1] that override protocol.version to 2 in repositories when running tests, allowing us to notice differences in behavior across different protocol versions. [1] https://public-inbox.org/git/cover.1547677183.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-18 12:48:41 -08:00
Jeff King	e20b4192a3	upload-pack: support hidden refs with protocol v2 In the v2 protocol, upload-pack's advertisement has been moved to the "ls-refs" command. That command does not respect hidden-ref config (like transfer.hiderefs) at all, and advertises everything. While there are some features that are not supported in v2 (e.g., v2 always allows fetching any sha1 without respect to advertisements), the lack of this feature is not documented and is likely just a bug. Let's make it work, as otherwise upgrading a server to a v2-capable git will start exposing these refs that the repository admin has asked to remain hidden. Note that we assume we're operating on behalf of a fetch here, since that's the only thing implemented in v2 at this point. See the in-code comment. We'll have to figure out how this works when the v2 push protocol is designed (both here in ls-refs, but also rejecting updates to hidden refs). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-10 11:15:37 -08:00
Brandon Williams	72d0ea0056	ls-refs: introduce ls-refs server command Introduce the ls-refs server command. In protocol v2, the ls-refs command is used to request the ref advertisement from the server. Since it is a command which can be requested (as opposed to mandatory in v1), a client can sent a number of parameters in its request to limit the ref advertisement based on provided ref-prefixes. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-15 12:01:08 -07:00

44 commits