git/fetch-pack.c

2119 lines
58 KiB
C
Raw Normal View History

#include "cache.h"
#include "repository.h"
#include "config.h"
#include "lockfile.h"
#include "refs.h"
#include "pkt-line.h"
#include "commit.h"
#include "tag.h"
#include "exec-cmd.h"
#include "pack.h"
#include "sideband.h"
#include "fetch-pack.h"
#include "remote.h"
#include "run-command.h"
#include "connect.h"
#include "transport.h"
#include "version.h"
#include "oid-array.h"
#include "oidset.h"
#include "packfile.h"
#include "object-store.h"
fetch-pack: write shallow, then check connectivity When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 22:08:43 +00:00
#include "connected.h"
#include "fetch-negotiator.h"
fetch: implement fetch.fsck.* Implement support for fetch.fsck.* corresponding with the existing receive.fsck.*. This allows for pedantically cloning repositories with specific issues without turning off fetch.fsckObjects. One such repository is https://github.com/robbyrussell/oh-my-zsh.git which before this change will emit this error when cloned with fetch.fsckObjects: error: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes fatal: Error in object fatal: index-pack failed Now with fetch.fsck.zeroPaddedFilemode=warn we'll warn about that issue, but the clone will succeed: warning: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes warning: object a18c4d13c2a5fa2d4ecd5346c50e119b999b807d: zeroPaddedFilemode: contains zero-padded file modes warning: object 84df066176c8da3fd59b13731a86d90f4f1e5c9d: zeroPaddedFilemode: contains zero-padded file modes The motivation for this is to be able to turn on fetch.fsckObjects globally across a fleet of computers but still be able to manually clone various legacy repositories by either white-listing specific issues, or better yet whitelist specific objects. The use of --git-dir=* instead of -C in the tests could be considered somewhat archaic, but the tests I'm adding here are duplicating the corresponding receive.* tests with as few changes as possible. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-27 14:37:17 +00:00
#include "fsck.h"
#include "shallow.h"
fetch: teach independent negotiation (no packfile) Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04 21:16:01 +00:00
#include "commit-reach.h"
#include "commit-graph.h"
fetch-pack: ignore SIGPIPE when writing to index-pack When fetching, we send the incoming pack to index-pack (or unpack-objects) via the sideband demuxer. If index-pack hits an error (e.g., because an object fails fsck), then it will die immediately. This may cause us to get SIGPIPE on the fetch, as we're still trying to write pack contents from the sideband demuxer (which is typically a thread, and thus takes down the whole fetch process). You can see this in action with: ./t5702-protocol-v2.sh --stress --run=59 which ends with (wrapped for readability): test_must_fail: died by signal 13: git -c protocol.version=2 \ -c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \ clone http://127.0.0.1:5708/smart/http_parent http_child not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object This is mostly cosmetic. The actual error of interest (in this case, the object that failed the fsck check) comes from index-pack straight to stderr, so the user still sees it. They _might_ even see fetch-pack complaining about index-pack failing, because the main thread is racing with the sideband-demuxer. But they'll definitely see the signal death in the exit code, which is what the test is complaining about. We can make this more predictable by just ignoring SIGPIPE. The sideband demuxer uses write_or_die(), so it will notice and stop (gracefully, because we hook die_routine() to exit just the thread). And during this section we're not writing anywhere else where we'd be concerned about SIGPIPE preventing us from wasting effort writing to nowhere. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-19 20:58:55 +00:00
#include "sigchain.h"
static int transfer_unpack_limit = -1;
static int fetch_unpack_limit = -1;
static int unpack_limit = 100;
static int prefer_ofs_delta = 1;
static int no_done;
static int deepen_since_ok;
static int deepen_not_ok;
static int fetch_fsck_objects = -1;
static int transfer_fsck_objects = -1;
static int agent_supported;
static int server_supports_filtering;
static int advertise_sid;
static struct shallow_lock shallow_lock;
static const char *alternate_shallow_file;
static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
fetch: implement fetch.fsck.* Implement support for fetch.fsck.* corresponding with the existing receive.fsck.*. This allows for pedantically cloning repositories with specific issues without turning off fetch.fsckObjects. One such repository is https://github.com/robbyrussell/oh-my-zsh.git which before this change will emit this error when cloned with fetch.fsckObjects: error: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes fatal: Error in object fatal: index-pack failed Now with fetch.fsck.zeroPaddedFilemode=warn we'll warn about that issue, but the clone will succeed: warning: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes warning: object a18c4d13c2a5fa2d4ecd5346c50e119b999b807d: zeroPaddedFilemode: contains zero-padded file modes warning: object 84df066176c8da3fd59b13731a86d90f4f1e5c9d: zeroPaddedFilemode: contains zero-padded file modes The motivation for this is to be able to turn on fetch.fsckObjects globally across a fleet of computers but still be able to manually clone various legacy repositories by either white-listing specific issues, or better yet whitelist specific objects. The use of --git-dir=* instead of -C in the tests could be considered somewhat archaic, but the tests I'm adding here are duplicating the corresponding receive.* tests with as few changes as possible. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-27 14:37:17 +00:00
static struct strbuf fsck_msg_types = STRBUF_INIT;
static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
/* Remember to update object flag allocation in object.h */
#define COMPLETE (1U << 0)
#define ALTERNATE (1U << 1)
fetch: teach independent negotiation (no packfile) Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04 21:16:01 +00:00
#define COMMON (1U << 6)
#define REACH_SCRATCH (1U << 7)
/*
* After sending this many "have"s if we do not get any new ACK , we
* give up traversing our history.
*/
#define MAX_IN_VAIN 256
static int multi_ack, use_sideband;
/* Allow specifying sha1 if it is a ref tip. */
#define ALLOW_TIP_SHA1 01
/* Allow request of a sha1 if it is reachable from a ref (possibly hidden ref). */
#define ALLOW_REACHABLE_SHA1 02
static unsigned int allow_unadvertised_object_request;
__attribute__((format (printf, 2, 3)))
static inline void print_verbose(const struct fetch_pack_args *args,
const char *fmt, ...)
{
va_list params;
if (!args->verbose)
return;
va_start(params, fmt);
vfprintf(stderr, fmt, params);
va_end(params);
fputc('\n', stderr);
}
fetch-pack: cache results of for_each_alternate_ref We may run for_each_alternate_ref() twice, once in find_common() and once in everything_local(). This operation can be expensive, because it involves running a sub-process which must freshly load all of the alternate's refs from disk. Let's cache and reuse the results between the two calls. We can make some optimizations based on the particular use pattern in fetch-pack to keep our memory usage down. The first is that we only care about the sha1s, not the refs themselves. So it's OK to store only the sha1s, and to suppress duplicates. The natural fit would therefore be a sha1_array. However, sha1_array's de-duplication happens only after it has read and sorted all entries. It still stores each duplicate. For an alternate with a large number of refs pointing to the same commits, this is a needless expense. Instead, we'd prefer to eliminate duplicates before putting them in the cache, which implies using a hash. We can further note that fetch-pack will call parse_object() on each alternate sha1. We can therefore keep our cache as a set of pointers to "struct object". That gives us a place to put our "already seen" bit with an optimized hash lookup. And as a bonus, the object stores the sha1 for us, so pointer-to-object is all we need. There are two extra optimizations I didn't do here: - we actually store an array of pointer-to-object. Technically we could just walk the obj_hash table looking for entries with the ALTERNATE flag set (because our use case doesn't care about the order here). But that hash table may be mostly composed of non-ALTERNATE entries, so we'd waste time walking over them. So it would be a slight win in memory use, but a loss in CPU. - the items we pull out of the cache are actual "struct object"s, but then we feed "obj->sha1" to our sub-functions, which promptly call parse_object(). This second parse is cheap, because it starts with lookup_object() and will bail immediately when it sees we've already parsed the object. We could save the extra hash lookup, but it would involve refactoring the functions we call. It may or may not be worth the trouble. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-08 20:53:03 +00:00
struct alternate_object_cache {
struct object **items;
size_t nr, alloc;
};
static void cache_one_alternate(const struct object_id *oid,
fetch-pack: cache results of for_each_alternate_ref We may run for_each_alternate_ref() twice, once in find_common() and once in everything_local(). This operation can be expensive, because it involves running a sub-process which must freshly load all of the alternate's refs from disk. Let's cache and reuse the results between the two calls. We can make some optimizations based on the particular use pattern in fetch-pack to keep our memory usage down. The first is that we only care about the sha1s, not the refs themselves. So it's OK to store only the sha1s, and to suppress duplicates. The natural fit would therefore be a sha1_array. However, sha1_array's de-duplication happens only after it has read and sorted all entries. It still stores each duplicate. For an alternate with a large number of refs pointing to the same commits, this is a needless expense. Instead, we'd prefer to eliminate duplicates before putting them in the cache, which implies using a hash. We can further note that fetch-pack will call parse_object() on each alternate sha1. We can therefore keep our cache as a set of pointers to "struct object". That gives us a place to put our "already seen" bit with an optimized hash lookup. And as a bonus, the object stores the sha1 for us, so pointer-to-object is all we need. There are two extra optimizations I didn't do here: - we actually store an array of pointer-to-object. Technically we could just walk the obj_hash table looking for entries with the ALTERNATE flag set (because our use case doesn't care about the order here). But that hash table may be mostly composed of non-ALTERNATE entries, so we'd waste time walking over them. So it would be a slight win in memory use, but a loss in CPU. - the items we pull out of the cache are actual "struct object"s, but then we feed "obj->sha1" to our sub-functions, which promptly call parse_object(). This second parse is cheap, because it starts with lookup_object() and will bail immediately when it sees we've already parsed the object. We could save the extra hash lookup, but it would involve refactoring the functions we call. It may or may not be worth the trouble. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-08 20:53:03 +00:00
void *vcache)
{
struct alternate_object_cache *cache = vcache;
struct object *obj = parse_object(the_repository, oid);
fetch-pack: cache results of for_each_alternate_ref We may run for_each_alternate_ref() twice, once in find_common() and once in everything_local(). This operation can be expensive, because it involves running a sub-process which must freshly load all of the alternate's refs from disk. Let's cache and reuse the results between the two calls. We can make some optimizations based on the particular use pattern in fetch-pack to keep our memory usage down. The first is that we only care about the sha1s, not the refs themselves. So it's OK to store only the sha1s, and to suppress duplicates. The natural fit would therefore be a sha1_array. However, sha1_array's de-duplication happens only after it has read and sorted all entries. It still stores each duplicate. For an alternate with a large number of refs pointing to the same commits, this is a needless expense. Instead, we'd prefer to eliminate duplicates before putting them in the cache, which implies using a hash. We can further note that fetch-pack will call parse_object() on each alternate sha1. We can therefore keep our cache as a set of pointers to "struct object". That gives us a place to put our "already seen" bit with an optimized hash lookup. And as a bonus, the object stores the sha1 for us, so pointer-to-object is all we need. There are two extra optimizations I didn't do here: - we actually store an array of pointer-to-object. Technically we could just walk the obj_hash table looking for entries with the ALTERNATE flag set (because our use case doesn't care about the order here). But that hash table may be mostly composed of non-ALTERNATE entries, so we'd waste time walking over them. So it would be a slight win in memory use, but a loss in CPU. - the items we pull out of the cache are actual "struct object"s, but then we feed "obj->sha1" to our sub-functions, which promptly call parse_object(). This second parse is cheap, because it starts with lookup_object() and will bail immediately when it sees we've already parsed the object. We could save the extra hash lookup, but it would involve refactoring the functions we call. It may or may not be worth the trouble. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-08 20:53:03 +00:00
if (!obj || (obj->flags & ALTERNATE))
return;
obj->flags |= ALTERNATE;
ALLOC_GROW(cache->items, cache->nr + 1, cache->alloc);
cache->items[cache->nr++] = obj;
}
static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
void (*cb)(struct fetch_negotiator *,
struct object *))
fetch-pack: cache results of for_each_alternate_ref We may run for_each_alternate_ref() twice, once in find_common() and once in everything_local(). This operation can be expensive, because it involves running a sub-process which must freshly load all of the alternate's refs from disk. Let's cache and reuse the results between the two calls. We can make some optimizations based on the particular use pattern in fetch-pack to keep our memory usage down. The first is that we only care about the sha1s, not the refs themselves. So it's OK to store only the sha1s, and to suppress duplicates. The natural fit would therefore be a sha1_array. However, sha1_array's de-duplication happens only after it has read and sorted all entries. It still stores each duplicate. For an alternate with a large number of refs pointing to the same commits, this is a needless expense. Instead, we'd prefer to eliminate duplicates before putting them in the cache, which implies using a hash. We can further note that fetch-pack will call parse_object() on each alternate sha1. We can therefore keep our cache as a set of pointers to "struct object". That gives us a place to put our "already seen" bit with an optimized hash lookup. And as a bonus, the object stores the sha1 for us, so pointer-to-object is all we need. There are two extra optimizations I didn't do here: - we actually store an array of pointer-to-object. Technically we could just walk the obj_hash table looking for entries with the ALTERNATE flag set (because our use case doesn't care about the order here). But that hash table may be mostly composed of non-ALTERNATE entries, so we'd waste time walking over them. So it would be a slight win in memory use, but a loss in CPU. - the items we pull out of the cache are actual "struct object"s, but then we feed "obj->sha1" to our sub-functions, which promptly call parse_object(). This second parse is cheap, because it starts with lookup_object() and will bail immediately when it sees we've already parsed the object. We could save the extra hash lookup, but it would involve refactoring the functions we call. It may or may not be worth the trouble. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-08 20:53:03 +00:00
{
static int initialized;
static struct alternate_object_cache cache;
size_t i;
if (!initialized) {
for_each_alternate_ref(cache_one_alternate, &cache);
initialized = 1;
}
for (i = 0; i < cache.nr; i++)
cb(negotiator, cache.items[i]);
}
static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
int mark_tags_complete)
{
enum object_type type;
struct object_info info = { .typep = &type };
struct commit *commit;
commit = lookup_commit_in_graph(the_repository, oid);
if (commit)
return commit;
while (1) {
if (oid_object_info_extended(the_repository, oid, &info,
OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK))
return NULL;
if (type == OBJ_TAG) {
struct tag *tag = (struct tag *)
parse_object(the_repository, oid);
if (!tag->tagged)
return NULL;
if (mark_tags_complete)
tag->object.flags |= COMPLETE;
oid = &tag->tagged->oid;
} else {
break;
}
}
fetch-pack: speed up loading of refs via commit graph When doing reference negotiation, git-fetch-pack(1) is loading all refs from disk in order to determine which commits it has in common with the remote repository. This can be quite expensive in repositories with many references though: in a real-world repository with around 2.2 million refs, fetching a single commit by its ID takes around 44 seconds. Dominating the loading time is decompression and parsing of the objects which are referenced by commits. Given the fact that we only care about commits (or tags which can be peeled to one) in this context, there is thus an easy performance win by switching the parsing logic to make use of the commit graph in case we have one available. Like this, we avoid hitting the object database to parse these commits but instead only load them from the commit-graph. This results in a significant performance boost when executing git-fetch in said repository with 2.2 million refs: Benchmark #1: HEAD~: git fetch $remote $commit Time (mean ± σ): 44.168 s ± 0.341 s [User: 42.985 s, System: 1.106 s] Range (min … max): 43.565 s … 44.577 s 10 runs Benchmark #2: HEAD: git fetch $remote $commit Time (mean ± σ): 19.498 s ± 0.724 s [User: 18.751 s, System: 0.690 s] Range (min … max): 18.629 s … 20.454 s 10 runs Summary 'HEAD: git fetch $remote $commit' ran 2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-04 13:56:11 +00:00
if (type == OBJ_COMMIT) {
struct commit *commit = lookup_commit(the_repository, oid);
if (!commit || repo_parse_commit(the_repository, commit))
return NULL;
return commit;
}
return NULL;
}
static int rev_list_insert_ref(struct fetch_negotiator *negotiator,
const struct object_id *oid)
{
struct commit *c = deref_without_lazy_fetch(oid, 0);
if (c)
negotiator->add_tip(negotiator, c);
return 0;
}
static int rev_list_insert_ref_oid(const char *refname, const struct object_id *oid,
int flag, void *cb_data)
{
return rev_list_insert_ref(cb_data, oid);
}
enum ack_type {
NAK = 0,
ACK,
ACK_continue,
ACK_common,
ACK_ready
};
static void consume_shallow_list(struct fetch_pack_args *args,
struct packet_reader *reader)
{
if (args->stateless_rpc && args->deepen) {
/* If we sent a depth we will get back "duplicate"
* shallow and unshallow commands every time there
* is a block of have lines exchanged.
*/
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
if (starts_with(reader->line, "shallow "))
continue;
if (starts_with(reader->line, "unshallow "))
continue;
die(_("git fetch-pack: expected shallow list"));
}
if (reader->status != PACKET_READ_FLUSH)
die(_("git fetch-pack: expected a flush packet after shallow list"));
}
}
static enum ack_type get_ack(struct packet_reader *reader,
struct object_id *result_oid)
{
pkt-line: provide a LARGE_PACKET_MAX static buffer Most of the callers of packet_read_line just read into a static 1000-byte buffer (callers which handle arbitrary binary data already use LARGE_PACKET_MAX). This works fine in practice, because: 1. The only variable-sized data in these lines is a ref name, and refs tend to be a lot shorter than 1000 characters. 2. When sending ref lines, git-core always limits itself to 1000 byte packets. However, the only limit given in the protocol specification in Documentation/technical/protocol-common.txt is LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in pack-protocol.txt, and then only describing what we write, not as a specific limit for readers. This patch lets us bump the 1000-byte limit to LARGE_PACKET_MAX. Even though git-core will never write a packet where this makes a difference, there are two good reasons to do this: 1. Other git implementations may have followed protocol-common.txt and used a larger maximum size. We don't bump into it in practice because it would involve very long ref names. 2. We may want to increase the 1000-byte limit one day. Since packets are transferred before any capabilities, it's difficult to do this in a backwards-compatible way. But if we bump the size of buffer the readers can handle, eventually older versions of git will be obsolete enough that we can justify bumping the writers, as well. We don't have plans to do this anytime soon, but there is no reason not to start the clock ticking now. Just bumping all of the reading bufs to LARGE_PACKET_MAX would waste memory. Instead, since most readers just read into a temporary buffer anyway, let's provide a single static buffer that all callers can use. We can further wrap this detail away by having the packet_read_line wrapper just use the buffer transparently and return a pointer to the static storage. That covers most of the cases, and the remaining ones already read into their own LARGE_PACKET_MAX buffers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 20:02:57 +00:00
int len;
const char *arg;
if (packet_reader_read(reader) != PACKET_READ_NORMAL)
die(_("git fetch-pack: expected ACK/NAK, got a flush packet"));
len = reader->pktlen;
if (!strcmp(reader->line, "NAK"))
return NAK;
if (skip_prefix(reader->line, "ACK ", &arg)) {
const char *p;
if (!parse_oid_hex(arg, result_oid, &p)) {
len -= p - reader->line;
if (len < 1)
return ACK;
if (strstr(p, "continue"))
return ACK_continue;
if (strstr(p, "common"))
return ACK_common;
if (strstr(p, "ready"))
return ACK_ready;
return ACK;
}
}
die(_("git fetch-pack: expected ACK/NAK, got '%s'"), reader->line);
}
static void send_request(struct fetch_pack_args *args,
int fd, struct strbuf *buf)
{
if (args->stateless_rpc) {
send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
packet_flush(fd);
} else {
if (write_in_full(fd, buf->buf, buf->len) < 0)
die_errno(_("unable to write to remote"));
}
}
static void insert_one_alternate_object(struct fetch_negotiator *negotiator,
struct object *obj)
{
rev_list_insert_ref(negotiator, &obj->oid);
}
#define INITIAL_FLUSH 16
#define PIPESAFE_FLUSH 32
#define LARGE_FLUSH 16384
static int next_flush(int stateless_rpc, int count)
{
if (stateless_rpc) {
if (count < LARGE_FLUSH)
count <<= 1;
else
count = count * 11 / 10;
} else {
if (count < PIPESAFE_FLUSH)
count <<= 1;
else
count += PIPESAFE_FLUSH;
}
return count;
}
static void mark_tips(struct fetch_negotiator *negotiator,
const struct oid_array *negotiation_tips)
{
int i;
if (!negotiation_tips) {
for_each_rawref(rev_list_insert_ref_oid, negotiator);
return;
}
for (i = 0; i < negotiation_tips->nr; i++)
rev_list_insert_ref(negotiator, &negotiation_tips->oid[i]);
return;
}
static int find_common(struct fetch_negotiator *negotiator,
struct fetch_pack_args *args,
int fd[2], struct object_id *result_oid,
struct ref *refs)
{
int fetching;
int count = 0, flushes = 0, flush_at = INITIAL_FLUSH, retval;
const struct object_id *oid;
unsigned in_vain = 0;
int got_continue = 0;
int got_ready = 0;
struct strbuf req_buf = STRBUF_INIT;
size_t state_len = 0;
struct packet_reader reader;
if (args->stateless_rpc && multi_ack == 1)
die(_("--stateless-rpc requires multi_ack_detailed"));
packet_reader_init(&reader, fd[0], NULL, 0,
PACKET_READ_CHOMP_NEWLINE |
PACKET_READ_DIE_ON_ERR_PACKET);
mark_tips(negotiator, args->negotiation_tips);
for_each_cached_alternate(negotiator, insert_one_alternate_object);
fetching = 0;
for ( ; refs ; refs = refs->next) {
struct object_id *remote = &refs->old_oid;
const char *remote_hex;
struct object *o;
/*
* If that object is complete (i.e. it is an ancestor of a
* local ref), we tell them we have it but do not have to
* tell them about its ancestors, which they already know
* about.
*
* We use lookup_object here because we are only
* interested in the case we *know* the object is
* reachable and we have already scanned it.
*/
if (((o = lookup_object(the_repository, remote)) != NULL) &&
(o->flags & COMPLETE)) {
continue;
}
remote_hex = oid_to_hex(remote);
if (!fetching) {
struct strbuf c = STRBUF_INIT;
if (multi_ack == 2) strbuf_addstr(&c, " multi_ack_detailed");
if (multi_ack == 1) strbuf_addstr(&c, " multi_ack");
if (no_done) strbuf_addstr(&c, " no-done");
if (use_sideband == 2) strbuf_addstr(&c, " side-band-64k");
if (use_sideband == 1) strbuf_addstr(&c, " side-band");
fetch, upload-pack: --deepen=N extends shallow boundary by N commits In git-fetch, --depth argument is always relative with the latest remote refs. This makes it a bit difficult to cover this use case, where the user wants to make the shallow history, say 3 levels deeper. It would work if remote refs have not moved yet, but nobody can guarantee that, especially when that use case is performed a couple months after the last clone or "git fetch --depth". Also, modifying shallow boundary using --depth does not work well with clones created by --since or --not. This patch fixes that. A new argument --deepen=<N> will add <N> more (*) parent commits to the current history regardless of where remote refs are. Have/Want negotiation is still respected. So if remote refs move, the server will send two chunks: one between "have" and "want" and another to extend shallow history. In theory, the client could send no "want"s in order to get the second chunk only. But the protocol does not allow that. Either you send no want lines, which means ls-remote; or you have to send at least one want line that carries deep-relative to the server.. The main work was done by Dongcan Jiang. I fixed it up here and there. And of course all the bugs belong to me. (*) We could even support --deepen=<N> where <N> is negative. In that case we can cut some history from the shallow clone. This operation (and --depth=<shorter depth>) does not require interaction with remote side (and more complicated to implement as a result). Helped-by: Duy Nguyen <pclouds@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 10:54:09 +00:00
if (args->deepen_relative) strbuf_addstr(&c, " deepen-relative");
if (args->use_thin_pack) strbuf_addstr(&c, " thin-pack");
if (args->no_progress) strbuf_addstr(&c, " no-progress");
if (args->include_tag) strbuf_addstr(&c, " include-tag");
if (prefer_ofs_delta) strbuf_addstr(&c, " ofs-delta");
if (deepen_since_ok) strbuf_addstr(&c, " deepen-since");
if (deepen_not_ok) strbuf_addstr(&c, " deepen-not");
if (agent_supported) strbuf_addf(&c, " agent=%s",
git_user_agent_sanitized());
if (advertise_sid)
strbuf_addf(&c, " session-id=%s", trace2_session_id());
if (args->filter_options.choice)
strbuf_addstr(&c, " filter");
packet_buf_write(&req_buf, "want %s%s\n", remote_hex, c.buf);
strbuf_release(&c);
} else
packet_buf_write(&req_buf, "want %s\n", remote_hex);
fetching++;
}
if (!fetching) {
strbuf_release(&req_buf);
packet_flush(fd[1]);
return 1;
}
if (is_repository_shallow(the_repository))
write_shallow_commits(&req_buf, 1, NULL);
if (args->depth > 0)
packet_buf_write(&req_buf, "deepen %d", args->depth);
if (args->deepen_since) {
timestamp_t max_age = approxidate(args->deepen_since);
packet_buf_write(&req_buf, "deepen-since %"PRItime, max_age);
}
if (args->deepen_not) {
int i;
for (i = 0; i < args->deepen_not->nr; i++) {
struct string_list_item *s = args->deepen_not->items + i;
packet_buf_write(&req_buf, "deepen-not %s", s->string);
}
}
if (server_supports_filtering && args->filter_options.choice) {
const char *spec =
expand_list_objects_filter_spec(&args->filter_options);
packet_buf_write(&req_buf, "filter %s", spec);
}
packet_buf_flush(&req_buf);
state_len = req_buf.len;
if (args->deepen) {
const char *arg;
struct object_id oid;
send_request(args, fd[1], &req_buf);
while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
if (skip_prefix(reader.line, "shallow ", &arg)) {
if (get_oid_hex(arg, &oid))
die(_("invalid shallow line: %s"), reader.line);
register_shallow(the_repository, &oid);
continue;
}
if (skip_prefix(reader.line, "unshallow ", &arg)) {
if (get_oid_hex(arg, &oid))
die(_("invalid unshallow line: %s"), reader.line);
if (!lookup_object(the_repository, &oid))
die(_("object not found: %s"), reader.line);
/* make sure that it is parsed as shallow */
if (!parse_object(the_repository, &oid))
die(_("error in object: %s"), reader.line);
if (unregister_shallow(&oid))
die(_("no shallow found: %s"), reader.line);
continue;
}
die(_("expected shallow/unshallow, got %s"), reader.line);
}
} else if (!args->stateless_rpc)
send_request(args, fd[1], &req_buf);
if (!args->stateless_rpc) {
/* If we aren't using the stateless-rpc interface
* we don't need to retain the headers.
*/
strbuf_setlen(&req_buf, 0);
state_len = 0;
}
trace2_region_enter("fetch-pack", "negotiation_v0_v1", the_repository);
flushes = 0;
retval = -1;
while ((oid = negotiator->next(negotiator))) {
packet_buf_write(&req_buf, "have %s\n", oid_to_hex(oid));
print_verbose(args, "have %s", oid_to_hex(oid));
in_vain++;
if (flush_at <= ++count) {
int ack;
packet_buf_flush(&req_buf);
send_request(args, fd[1], &req_buf);
strbuf_setlen(&req_buf, state_len);
flushes++;
flush_at = next_flush(args->stateless_rpc, count);
/*
* We keep one window "ahead" of the other side, and
* will wait for an ACK only on the next one
*/
if (!args->stateless_rpc && count == INITIAL_FLUSH)
continue;
consume_shallow_list(args, &reader);
do {
ack = get_ack(&reader, result_oid);
if (ack)
print_verbose(args, _("got %s %d %s"), "ack",
ack, oid_to_hex(result_oid));
switch (ack) {
case ACK:
flushes = 0;
multi_ack = 0;
retval = 0;
goto done;
case ACK_common:
case ACK_ready:
case ACK_continue: {
struct commit *commit =
lookup_commit(the_repository,
result_oid);
int was_common;
if (!commit)
die(_("invalid commit %s"), oid_to_hex(result_oid));
was_common = negotiator->ack(negotiator, commit);
if (args->stateless_rpc
&& ack == ACK_common
&& !was_common) {
/* We need to replay the have for this object
* on the next RPC request so the peer knows
* it is in common with us.
*/
const char *hex = oid_to_hex(result_oid);
packet_buf_write(&req_buf, "have %s\n", hex);
state_len = req_buf.len;
/*
* Reset in_vain because an ack
* for this commit has not been
* seen.
*/
in_vain = 0;
} else if (!args->stateless_rpc
|| ack != ACK_common)
in_vain = 0;
retval = 0;
got_continue = 1;
if (ack == ACK_ready)
got_ready = 1;
break;
}
}
} while (ack);
flushes--;
if (got_continue && MAX_IN_VAIN < in_vain) {
print_verbose(args, _("giving up"));
break; /* give up */
}
if (got_ready)
break;
}
}
done:
trace2_region_leave("fetch-pack", "negotiation_v0_v1", the_repository);
if (!got_ready || !no_done) {
packet_buf_write(&req_buf, "done\n");
send_request(args, fd[1], &req_buf);
}
print_verbose(args, _("done"));
if (retval != 0) {
multi_ack = 0;
flushes++;
}
strbuf_release(&req_buf);
if (!got_ready || !no_done)
consume_shallow_list(args, &reader);
while (flushes || multi_ack) {
int ack = get_ack(&reader, result_oid);
if (ack) {
print_verbose(args, _("got %s (%d) %s"), "ack",
ack, oid_to_hex(result_oid));
if (ack == ACK)
return 0;
multi_ack = 1;
continue;
}
flushes--;
}
/* it is no error to fetch into a completely empty repo */
return count ? retval : 0;
}
static struct commit_list *complete;
static int mark_complete(const struct object_id *oid)
{
struct commit *commit = deref_without_lazy_fetch(oid, 1);
if (commit && !(commit->object.flags & COMPLETE)) {
commit->object.flags |= COMPLETE;
commit_list_insert(commit, &complete);
}
return 0;
}
static int mark_complete_oid(const char *refname, const struct object_id *oid,
int flag, void *cb_data)
{
return mark_complete(oid);
}
static void mark_recent_complete_commits(struct fetch_pack_args *args,
timestamp_t cutoff)
{
while (complete && cutoff <= complete->item->date) {
print_verbose(args, _("Marking %s as complete"),
oid_to_hex(&complete->item->object.oid));
pop_most_recent_commit(&complete, COMPLETE);
}
}
static void add_refs_to_oidset(struct oidset *oids, struct ref *refs)
{
for (; refs; refs = refs->next)
oidset_insert(oids, &refs->old_oid);
}
static int is_unmatched_ref(const struct ref *ref)
{
struct object_id oid;
const char *p;
return ref->match_status == REF_NOT_MATCHED &&
!parse_oid_hex(ref->name, &oid, &p) &&
*p == '\0' &&
oideq(&oid, &ref->old_oid);
}
static void filter_refs(struct fetch_pack_args *args,
struct ref **refs,
struct ref **sought, int nr_sought)
{
struct ref *newlist = NULL;
struct ref **newtail = &newlist;
struct ref *unmatched = NULL;
struct ref *ref, *next;
struct oidset tip_oids = OIDSET_INIT;
int i;
int strict = !(allow_unadvertised_object_request &
(ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1));
i = 0;
for (ref = *refs; ref; ref = next) {
int keep = 0;
next = ref->next;
if (starts_with(ref->name, "refs/") &&
fetch: do not consider peeled tags as advertised tips Our filter_refs() function accidentally considers the target of a peeled tag to be advertised by the server, even though upload-pack on the server side does not consider it so. This can result in the client making a bogus fetch to the server, which will end with the server complaining "not our ref". Whereas the correct behavior is for the client to notice that the server will not allow the request and error out immediately. So as bugs go, this is not very serious (the outcome is the same either way -- the fetch fails). But it's worth making the logic here correct and consistent with other related cases (e.g., fetching an oid that the server did not mention at all). The crux of the issue comes from fdb69d33c4 (fetch-pack: always allow fetching of literal SHA1s, 2017-05-15). After that, the strategy of filter_refs() is basically: - for each advertised ref, try to match it with a "sought" ref provided by the user. Skip any malformed refs (which includes peeled values like "refs/tags/foo^{}"), and place any unmatched items onto the unmatched list. - if there are unmatched sought refs, then put all of the advertised tips into an oidset, including the unmatched ones. - for each sought ref, see if it's in the oidset, in which case it's legal for us to ask the server for it The problem is in the second step. Our list of unmatched refs includes the peeled refs, even though upload-pack does not allow them to be directly fetched. So the simplest fix would be to exclude them during that step. However, we can observe that the unmatched list isn't used for anything else, and is freed at the end. We can just free those malformed refs immediately. That saves us having to check each ref a second time to see if it's malformed. Note that this code only kicks in when "strict" is in effect. I.e., if we are using the v0 protocol and uploadpack.allowReachableSHA1InWant is not in effect. With v2, all oids are allowed, and we do not bother creating or consulting the oidset at all. To future-proof our test against the upcoming GIT_TEST_PROTOCOL_VERSION flag, we'll manually mark it as a v0-only test. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-13 05:57:37 +00:00
check_refname_format(ref->name, 0)) {
/*
* trash or a peeled value; do not even add it to
* unmatched list
*/
free_one_ref(ref);
continue;
} else {
while (i < nr_sought) {
int cmp = strcmp(ref->name, sought[i]->name);
if (cmp < 0)
break; /* definitely do not have it */
else if (cmp == 0) {
keep = 1; /* definitely have it */
sought[i]->match_status = REF_MATCHED;
}
i++;
}
fetch-pack: don't try to fetch peel values with --all When "fetch-pack --all" sees a tag-to-blob on the remote, it tries to fetch both the tag itself ("refs/tags/foo") and the peeled value that the remote advertises ("refs/tags/foo^{}"). Asking for the object pointed to by the latter can cause upload-pack to complain with "not our ref", since it does not mark the peeled objects with the OUR_REF (unless they were at the tip of some other ref). Arguably upload-pack _should_ be marking those peeled objects. But it never has in the past, since clients would generally just ask for the tag and expect to get the peeled value along with it. And that's how "git fetch" works, as well as older versions of "fetch-pack --all". The problem was introduced by 5f0fc64513 (fetch-pack: eliminate spurious error messages, 2012-09-09). Before then, the matching logic was something like: if (refname is ill-formed) do nothing else if (doing --all) always consider it matched else look through list of sought refs for a match That commit wanted to flip the order of the second two arms of that conditional. But we ended up with: if (refname is ill-formed) do nothing else look through list of sought refs for a match if (--all and no match so far) always consider it matched That means tha an ill-formed ref will trigger the --all conditional block, even though we should just be ignoring it. We can fix that by having a single "else" with all of the well-formed logic, that checks the sought refs and "--all" in the correct order. Reported-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-11 05:53:57 +00:00
if (!keep && args->fetch_all &&
(!args->deepen || !starts_with(ref->name, "refs/tags/")))
keep = 1;
}
if (keep) {
*newtail = ref;
ref->next = NULL;
newtail = &ref->next;
} else {
ref->next = unmatched;
unmatched = ref;
}
}
if (strict) {
for (i = 0; i < nr_sought; i++) {
ref = sought[i];
if (!is_unmatched_ref(ref))
continue;
add_refs_to_oidset(&tip_oids, unmatched);
add_refs_to_oidset(&tip_oids, newlist);
break;
}
}
/* Append unmatched requests to the list */
for (i = 0; i < nr_sought; i++) {
ref = sought[i];
if (!is_unmatched_ref(ref))
continue;
if (!strict || oidset_contains(&tip_oids, &ref->old_oid)) {
ref->match_status = REF_MATCHED;
filter_ref: make a copy of extra "sought" entries If the server supports allow_tip_sha1_in_want, we add any unmatched raw-sha1 entries in our "sought" list of refs to the list of refs we will ask the other side for. We do so by inserting the original "struct ref" directly into our list, rather than making a copy. This has several problems. The most minor problem is that one cannot ever free the resulting list; it contains structs that are copies of the remote refs (made earlier by fetch_pack) along with sought refs that are referenced elsewhere. But more importantly that we set the ref->next pointer to NULL, chopping off the remainder of any existing list that the ref was a part of. We get the set of "sought" refs in an array rather than a linked list, but that array is often in turn generated from a list. The test modification in t5516 demonstrates this. Rather than fetching just an exact sha1, we fetch that sha1 plus another ref: - we build a linked list of refs to fetch when do_fetch calls get_ref_map; the exact sha1 is first, followed by the named ref ("refs/heads/extra" in this case). - we pass that linked list to transport_fetch_ref, which squashes it into an array of pointers - that array goes to fetch_pack, which calls filter_ref. There we generate the want list from a mix of what the remote side has advertised, and the "sought" entry for the exact sha1. We set the sought entry's "next" pointer to NULL. - after we return from transport_fetch_refs, we then try to update the refs by following the linked list. But our list is now truncated, and we do not update refs/heads/extra at all. We can fix this by making a copy of the ref. There's nothing that fetch_pack does to it that must be reflected in the original "sought" list (and indeed, if that were the case we would have a serious bug, because it is only exact-sha1 entries which are treated this way). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-19 20:37:09 +00:00
*newtail = copy_ref(ref);
newtail = &(*newtail)->next;
} else {
ref->match_status = REF_UNADVERTISED_NOT_ALLOWED;
}
}
oidset_clear(&tip_oids);
free_refs(unmatched);
*refs = newlist;
}
static void mark_alternate_complete(struct fetch_negotiator *unused,
struct object *obj)
{
mark_complete(&obj->oid);
}
struct loose_object_iter {
struct oidset *loose_object_set;
struct ref *refs;
};
/*
* Mark recent commits available locally and reachable from a local ref as
* COMPLETE.
*
* The cutoff time for recency is determined by this heuristic: it is the
* earliest commit time of the objects in refs that are commits and that we know
* the commit time of.
*/
static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
struct fetch_pack_args *args,
struct ref **refs)
{
struct ref *ref;
int old_save_commit_buffer = save_commit_buffer;
timestamp_t cutoff = 0;
save_commit_buffer = 0;
trace2_region_enter("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
for (ref = *refs; ref; ref = ref->next) {
struct object *o;
if (!has_object_file_with_flags(&ref->old_oid,
OBJECT_INFO_QUICK |
OBJECT_INFO_SKIP_FETCH_OBJECT))
continue;
o = parse_object(the_repository, &ref->old_oid);
if (!o)
continue;
/*
* We already have it -- which may mean that we were
* in sync with the other side at some time after
* that (it is OK if we guess wrong here).
*/
if (o->type == OBJ_COMMIT) {
struct commit *commit = (struct commit *)o;
if (!cutoff || cutoff < commit->date)
cutoff = commit->date;
}
}
trace2_region_leave("fetch-pack", "parse_remote_refs_and_find_cutoff", NULL);
/*
* This block marks all local refs as COMPLETE, and then recursively marks all
* parents of those refs as COMPLETE.
*/
trace2_region_enter("fetch-pack", "mark_complete_local_refs", NULL);
fetch-pack: avoid object flags if no_dependents When fetch_pack() is invoked as part of another Git command (due to a lazy fetch from a partial clone, for example), it uses object flags that may already be used by the outer Git command. The commit that introduced the lazy fetch feature (88e2f9ed8e ("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried to avoid this overlap, but it did not avoid it totally. It was successful in avoiding writing COMPLETE, but did not avoid reading COMPLETE, and did not avoid writing and reading ALTERNATE. Ensure that no flags are written or read by fetch_pack() in the case where it is used to perform a lazy fetch. To do this, it is sufficient to avoid checking completeness of wanted refs (unnecessary in the case of lazy fetches), and to avoid negotiation-related work (in the current implementation, already, no negotiation is performed). After that was done, the lack of overlap was verified by checking all direct and indirect usages of COMPLETE and ALTERNATE - that they are read or written only if no_dependents is false. There are other possible solutions to this issue: (1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using part, and whenever no_dependents is set, only use the non-flag-using part. (2) Make fetch_pack() be able to be used with arbitrary repository objects. fetch_pack() should then create its own repository object based on the given repository object, with its own object hashtable, so that the flags do not conflict. (1) is possible but invasive - some functions would need to be split; and such invasiveness would potentially be unnecessary if we ever were to need (2) anyway. (2) would be useful if we were to support, say, submodules that were partial clones themselves, but I don't know when or if the Git project plans to support those. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-03 23:04:52 +00:00
if (!args->deepen) {
for_each_rawref(mark_complete_oid, NULL);
fetch-pack: avoid object flags if no_dependents When fetch_pack() is invoked as part of another Git command (due to a lazy fetch from a partial clone, for example), it uses object flags that may already be used by the outer Git command. The commit that introduced the lazy fetch feature (88e2f9ed8e ("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried to avoid this overlap, but it did not avoid it totally. It was successful in avoiding writing COMPLETE, but did not avoid reading COMPLETE, and did not avoid writing and reading ALTERNATE. Ensure that no flags are written or read by fetch_pack() in the case where it is used to perform a lazy fetch. To do this, it is sufficient to avoid checking completeness of wanted refs (unnecessary in the case of lazy fetches), and to avoid negotiation-related work (in the current implementation, already, no negotiation is performed). After that was done, the lack of overlap was verified by checking all direct and indirect usages of COMPLETE and ALTERNATE - that they are read or written only if no_dependents is false. There are other possible solutions to this issue: (1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using part, and whenever no_dependents is set, only use the non-flag-using part. (2) Make fetch_pack() be able to be used with arbitrary repository objects. fetch_pack() should then create its own repository object based on the given repository object, with its own object hashtable, so that the flags do not conflict. (1) is possible but invasive - some functions would need to be split; and such invasiveness would potentially be unnecessary if we ever were to need (2) anyway. (2) would be useful if we were to support, say, submodules that were partial clones themselves, but I don't know when or if the Git project plans to support those. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-03 23:04:52 +00:00
for_each_cached_alternate(NULL, mark_alternate_complete);
commit_list_sort_by_date(&complete);
if (cutoff)
mark_recent_complete_commits(args, cutoff);
}
trace2_region_leave("fetch-pack", "mark_complete_local_refs", NULL);
fetch-pack: avoid object flags if no_dependents When fetch_pack() is invoked as part of another Git command (due to a lazy fetch from a partial clone, for example), it uses object flags that may already be used by the outer Git command. The commit that introduced the lazy fetch feature (88e2f9ed8e ("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried to avoid this overlap, but it did not avoid it totally. It was successful in avoiding writing COMPLETE, but did not avoid reading COMPLETE, and did not avoid writing and reading ALTERNATE. Ensure that no flags are written or read by fetch_pack() in the case where it is used to perform a lazy fetch. To do this, it is sufficient to avoid checking completeness of wanted refs (unnecessary in the case of lazy fetches), and to avoid negotiation-related work (in the current implementation, already, no negotiation is performed). After that was done, the lack of overlap was verified by checking all direct and indirect usages of COMPLETE and ALTERNATE - that they are read or written only if no_dependents is false. There are other possible solutions to this issue: (1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using part, and whenever no_dependents is set, only use the non-flag-using part. (2) Make fetch_pack() be able to be used with arbitrary repository objects. fetch_pack() should then create its own repository object based on the given repository object, with its own object hashtable, so that the flags do not conflict. (1) is possible but invasive - some functions would need to be split; and such invasiveness would potentially be unnecessary if we ever were to need (2) anyway. (2) would be useful if we were to support, say, submodules that were partial clones themselves, but I don't know when or if the Git project plans to support those. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-03 23:04:52 +00:00
/*
* Mark all complete remote refs as common refs.
* Don't mark them common yet; the server has to be told so first.
*/
trace2_region_enter("fetch-pack", "mark_common_remote_refs", NULL);
fetch-pack: avoid object flags if no_dependents When fetch_pack() is invoked as part of another Git command (due to a lazy fetch from a partial clone, for example), it uses object flags that may already be used by the outer Git command. The commit that introduced the lazy fetch feature (88e2f9ed8e ("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried to avoid this overlap, but it did not avoid it totally. It was successful in avoiding writing COMPLETE, but did not avoid reading COMPLETE, and did not avoid writing and reading ALTERNATE. Ensure that no flags are written or read by fetch_pack() in the case where it is used to perform a lazy fetch. To do this, it is sufficient to avoid checking completeness of wanted refs (unnecessary in the case of lazy fetches), and to avoid negotiation-related work (in the current implementation, already, no negotiation is performed). After that was done, the lack of overlap was verified by checking all direct and indirect usages of COMPLETE and ALTERNATE - that they are read or written only if no_dependents is false. There are other possible solutions to this issue: (1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using part, and whenever no_dependents is set, only use the non-flag-using part. (2) Make fetch_pack() be able to be used with arbitrary repository objects. fetch_pack() should then create its own repository object based on the given repository object, with its own object hashtable, so that the flags do not conflict. (1) is possible but invasive - some functions would need to be split; and such invasiveness would potentially be unnecessary if we ever were to need (2) anyway. (2) would be useful if we were to support, say, submodules that were partial clones themselves, but I don't know when or if the Git project plans to support those. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-03 23:04:52 +00:00
for (ref = *refs; ref; ref = ref->next) {
struct commit *c = deref_without_lazy_fetch(&ref->old_oid, 0);
if (!c || !(c->object.flags & COMPLETE))
fetch-pack: avoid object flags if no_dependents When fetch_pack() is invoked as part of another Git command (due to a lazy fetch from a partial clone, for example), it uses object flags that may already be used by the outer Git command. The commit that introduced the lazy fetch feature (88e2f9ed8e ("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried to avoid this overlap, but it did not avoid it totally. It was successful in avoiding writing COMPLETE, but did not avoid reading COMPLETE, and did not avoid writing and reading ALTERNATE. Ensure that no flags are written or read by fetch_pack() in the case where it is used to perform a lazy fetch. To do this, it is sufficient to avoid checking completeness of wanted refs (unnecessary in the case of lazy fetches), and to avoid negotiation-related work (in the current implementation, already, no negotiation is performed). After that was done, the lack of overlap was verified by checking all direct and indirect usages of COMPLETE and ALTERNATE - that they are read or written only if no_dependents is false. There are other possible solutions to this issue: (1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using part, and whenever no_dependents is set, only use the non-flag-using part. (2) Make fetch_pack() be able to be used with arbitrary repository objects. fetch_pack() should then create its own repository object based on the given repository object, with its own object hashtable, so that the flags do not conflict. (1) is possible but invasive - some functions would need to be split; and such invasiveness would potentially be unnecessary if we ever were to need (2) anyway. (2) would be useful if we were to support, say, submodules that were partial clones themselves, but I don't know when or if the Git project plans to support those. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-03 23:04:52 +00:00
continue;
negotiator->known_common(negotiator, c);
}
trace2_region_leave("fetch-pack", "mark_common_remote_refs", NULL);
save_commit_buffer = old_save_commit_buffer;
}
/*
* Returns 1 if every object pointed to by the given remote refs is available
* locally and reachable from a local ref, and 0 otherwise.
*/
static int everything_local(struct fetch_pack_args *args,
struct ref **refs)
{
struct ref *ref;
int retval;
for (retval = 1, ref = *refs; ref ; ref = ref->next) {
const struct object_id *remote = &ref->old_oid;
struct object *o;
o = lookup_object(the_repository, remote);
if (!o || !(o->flags & COMPLETE)) {
retval = 0;
print_verbose(args, "want %s (%s)", oid_to_hex(remote),
ref->name);
continue;
}
print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
ref->name);
}
return retval;
}
static int sideband_demux(int in, int out, void *data)
{
int *xd = data;
int ret;
ret = recv_sideband("fetch-pack", xd[0], out);
close(out);
return ret;
}
static void create_promisor_file(const char *keep_name,
struct ref **sought, int nr_sought)
{
struct strbuf promisor_name = STRBUF_INIT;
int suffix_stripped;
strbuf_addstr(&promisor_name, keep_name);
suffix_stripped = strbuf_strip_suffix(&promisor_name, ".keep");
if (!suffix_stripped)
BUG("name of pack lockfile should end with .keep (was '%s')",
keep_name);
strbuf_addstr(&promisor_name, ".promisor");
write_promisor_file(promisor_name.buf, sought, nr_sought);
strbuf_release(&promisor_name);
}
static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
{
int len = the_hash_algo->hexsz + 1; /* hash + NL */
do {
char hex_hash[GIT_MAX_HEXSZ + 1];
int read_len = read_in_full(fd, hex_hash, len);
struct object_id oid;
const char *end;
if (!read_len)
return;
if (read_len != len)
die("invalid length read %d", read_len);
if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
die("invalid hash");
oidset_insert(gitmodules_oids, &oid);
} while (1);
}
/*
* If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
* The strings to pass as the --index-pack-arg arguments to http-fetch will be
* stored there. (It must be freed by the caller.)
*/
static int get_pack(struct fetch_pack_args *args,
int xd[2], struct string_list *pack_lockfiles,
struct strvec *index_pack_args,
struct ref **sought, int nr_sought,
struct oidset *gitmodules_oids)
{
struct async demux;
int do_keep = args->keep_pack;
const char *cmd_name;
struct pack_header header;
int pass_header = 0;
struct child_process cmd = CHILD_PROCESS_INIT;
int fsck_objects = 0;
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-26 01:16:17 +00:00
int ret;
memset(&demux, 0, sizeof(demux));
if (use_sideband) {
/* xd[] is talking with upload-pack; subprocess reads from
* xd[0], spits out band#2 to stderr, and feeds us band#1
* through demux->out.
*/
demux.proc = sideband_demux;
demux.data = xd;
demux.out = -1;
demux.isolate_sigpipe = 1;
if (start_async(&demux))
die(_("fetch-pack: unable to fork off sideband demultiplexer"));
}
else
demux.out = xd[0];
fetch-pack: do not mix --pack_header and packfile uri When fetching (as opposed to cloning) from a repository with packfile URIs enabled, an error like this may occur: fatal: pack has bad object at offset 12: unknown object type 5 fatal: finish_http_pack_request gave result -1 fatal: fetch-pack: expected keep then TAB at start of http-fetch output This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs, use index-pack arg", 2021-02-22), when the index-pack args used when processing the inline packfile of a fetch response and when processing packfile URIs were unified. This bug happens because fetch, by default, partially reads (and consumes) the header of the inline packfile to determine if it should store the downloaded objects as a packfile or loose objects, and thus passes --pack_header=<...> to index-pack to inform it that some bytes are missing. However, when it subsequently fetches the additional packfiles linked by URIs, it reuses the same index-pack arguments, thus wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are missing. This does not happen when cloning because "git clone" always passes do_keep, which instructs the fetch mechanism to always retain the packfile, eliminating the need to read the header. There are a few ways to fix this, including filtering out pack_header arguments when downloading the additional packfiles, but I decided to stick to always using index-pack throughout when packfile URIs are present - thus, Git no longer needs to read the bytes, and no longer needs --pack_header here. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05 01:16:20 +00:00
if (!args->keep_pack && unpack_limit && !index_pack_args) {
if (read_pack_header(demux.out, &header))
die(_("protocol error: bad pack header"));
pass_header = 1;
if (ntohl(header.hdr_entries) < unpack_limit)
do_keep = 0;
else
do_keep = 1;
}
if (alternate_shallow_file) {
strvec_push(&cmd.args, "--shallow-file");
strvec_push(&cmd.args, alternate_shallow_file);
}
if (fetch_fsck_objects >= 0
? fetch_fsck_objects
: transfer_fsck_objects >= 0
? transfer_fsck_objects
: 0)
fsck_objects = 1;
if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
if (pack_lockfiles || fsck_objects)
cmd.out = -1;
cmd_name = "index-pack";
strvec_push(&cmd.args, cmd_name);
strvec_push(&cmd.args, "--stdin");
if (!args->quiet && !args->no_progress)
strvec_push(&cmd.args, "-v");
if (args->use_thin_pack)
strvec_push(&cmd.args, "--fix-thin");
fetch-pack: do not mix --pack_header and packfile uri When fetching (as opposed to cloning) from a repository with packfile URIs enabled, an error like this may occur: fatal: pack has bad object at offset 12: unknown object type 5 fatal: finish_http_pack_request gave result -1 fatal: fetch-pack: expected keep then TAB at start of http-fetch output This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs, use index-pack arg", 2021-02-22), when the index-pack args used when processing the inline packfile of a fetch response and when processing packfile URIs were unified. This bug happens because fetch, by default, partially reads (and consumes) the header of the inline packfile to determine if it should store the downloaded objects as a packfile or loose objects, and thus passes --pack_header=<...> to index-pack to inform it that some bytes are missing. However, when it subsequently fetches the additional packfiles linked by URIs, it reuses the same index-pack arguments, thus wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are missing. This does not happen when cloning because "git clone" always passes do_keep, which instructs the fetch mechanism to always retain the packfile, eliminating the need to read the header. There are a few ways to fix this, including filtering out pack_header arguments when downloading the additional packfiles, but I decided to stick to always using index-pack throughout when packfile URIs are present - thus, Git no longer needs to read the bytes, and no longer needs --pack_header here. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05 01:16:20 +00:00
if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
char hostname[HOST_NAME_MAX + 1];
if (xgethostname(hostname, sizeof(hostname)))
xsnprintf(hostname, sizeof(hostname), "localhost");
strvec_pushf(&cmd.args,
"--keep=fetch-pack %"PRIuMAX " on %s",
(uintmax_t)getpid(), hostname);
}
if (!index_pack_args && args->check_self_contained_and_connected)
strvec_push(&cmd.args, "--check-self-contained-and-connected");
else
/*
* We cannot perform any connectivity checks because
* not all packs have been downloaded; let the caller
* have this responsibility.
*/
args->check_self_contained_and_connected = 0;
if (args->from_promisor)
/*
* create_promisor_file() may be called afterwards but
* we still need index-pack to know that this is a
* promisor pack. For example, if transfer.fsckobjects
* is true, index-pack needs to know that .gitmodules
* is a promisor object (so that it won't complain if
* it is missing).
*/
strvec_push(&cmd.args, "--promisor");
}
else {
cmd_name = "unpack-objects";
strvec_push(&cmd.args, cmd_name);
if (args->quiet || args->no_progress)
strvec_push(&cmd.args, "-q");
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-26 01:16:17 +00:00
args->check_self_contained_and_connected = 0;
}
if (pass_header)
strvec_pushf(&cmd.args, "--pack_header=%"PRIu32",%"PRIu32,
ntohl(header.hdr_version),
ntohl(header.hdr_entries));
if (fsck_objects) {
if (args->from_promisor || index_pack_args)
/*
* We cannot use --strict in index-pack because it
* checks both broken objects and links, but we only
* want to check for broken objects.
*/
strvec_push(&cmd.args, "--fsck-objects");
else
strvec_pushf(&cmd.args, "--strict%s",
fsck_msg_types.buf);
}
if (index_pack_args) {
int i;
for (i = 0; i < cmd.args.nr; i++)
strvec_push(index_pack_args, cmd.args.v[i]);
}
fetch-pack: ignore SIGPIPE when writing to index-pack When fetching, we send the incoming pack to index-pack (or unpack-objects) via the sideband demuxer. If index-pack hits an error (e.g., because an object fails fsck), then it will die immediately. This may cause us to get SIGPIPE on the fetch, as we're still trying to write pack contents from the sideband demuxer (which is typically a thread, and thus takes down the whole fetch process). You can see this in action with: ./t5702-protocol-v2.sh --stress --run=59 which ends with (wrapped for readability): test_must_fail: died by signal 13: git -c protocol.version=2 \ -c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \ clone http://127.0.0.1:5708/smart/http_parent http_child not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object This is mostly cosmetic. The actual error of interest (in this case, the object that failed the fsck check) comes from index-pack straight to stderr, so the user still sees it. They _might_ even see fetch-pack complaining about index-pack failing, because the main thread is racing with the sideband-demuxer. But they'll definitely see the signal death in the exit code, which is what the test is complaining about. We can make this more predictable by just ignoring SIGPIPE. The sideband demuxer uses write_or_die(), so it will notice and stop (gracefully, because we hook die_routine() to exit just the thread). And during this section we're not writing anywhere else where we'd be concerned about SIGPIPE preventing us from wasting effort writing to nowhere. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-19 20:58:55 +00:00
sigchain_push(SIGPIPE, SIG_IGN);
cmd.in = demux.out;
cmd.git_cmd = 1;
if (start_command(&cmd))
die(_("fetch-pack: unable to fork off %s"), cmd_name);
if (do_keep && (pack_lockfiles || fsck_objects)) {
int is_well_formed;
char *pack_lockfile = index_pack_lockfile(cmd.out, &is_well_formed);
if (!is_well_formed)
die(_("fetch-pack: invalid index-pack output"));
if (pack_lockfile)
string_list_append_nodup(pack_lockfiles, pack_lockfile);
parse_gitmodules_oids(cmd.out, gitmodules_oids);
close(cmd.out);
}
if (!use_sideband)
/* Closed by start_command() */
xd[0] = -1;
clone: open a shortcut for connectivity check In order to make sure the cloned repository is good, we run "rev-list --objects --not --all $new_refs" on the repository. This is expensive on large repositories. This patch attempts to mitigate the impact in this special case. In the "good" clone case, we only have one pack. If all of the following are met, we can be sure that all objects reachable from the new refs exist, which is the intention of running "rev-list ...": - all refs point to an object in the pack - there are no dangling pointers in any object in the pack - no objects in the pack point to objects outside the pack The second and third checks can be done with the help of index-pack as a slight variation of --strict check (which introduces a new condition for the shortcut: pack transfer must be used and the number of objects large enough to call index-pack). The first is checked in check_everything_connected after we get an "ok" from index-pack. "index-pack + new checks" is still faster than the current "index-pack + rev-list", which is the whole point of this patch. If any of the conditions fail, we fall back to the good old but expensive "rev-list ..". In that case it's even more expensive because we have to pay for the new checks in index-pack. But that should only happen when the other side is either buggy or malicious. Cloning linux-2.6 over file:// before after real 3m25.693s 2m53.050s user 5m2.037s 4m42.396s sys 0m13.750s 0m16.574s A more realistic test with ssh:// over wireless before after real 11m26.629s 10m4.213s user 5m43.196s 5m19.444s sys 0m35.812s 0m37.630s This shortcut is not applied to shallow clones, partly because shallow clones should have no more objects than a usual fetch and the cost of rev-list is acceptable, partly to avoid dealing with corner cases when grafting is involved. This shortcut does not apply to unpack-objects code path either because the number of objects must be small in order to trigger that code path. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-26 01:16:17 +00:00
ret = finish_command(&cmd);
if (!ret || (args->check_self_contained_and_connected && ret == 1))
args->self_contained_and_connected =
args->check_self_contained_and_connected &&
ret == 0;
else
die(_("%s failed"), cmd_name);
if (use_sideband && finish_async(&demux))
die(_("error in sideband demultiplexer"));
fetch-pack: ignore SIGPIPE when writing to index-pack When fetching, we send the incoming pack to index-pack (or unpack-objects) via the sideband demuxer. If index-pack hits an error (e.g., because an object fails fsck), then it will die immediately. This may cause us to get SIGPIPE on the fetch, as we're still trying to write pack contents from the sideband demuxer (which is typically a thread, and thus takes down the whole fetch process). You can see this in action with: ./t5702-protocol-v2.sh --stress --run=59 which ends with (wrapped for readability): test_must_fail: died by signal 13: git -c protocol.version=2 \ -c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \ clone http://127.0.0.1:5708/smart/http_parent http_child not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object This is mostly cosmetic. The actual error of interest (in this case, the object that failed the fsck check) comes from index-pack straight to stderr, so the user still sees it. They _might_ even see fetch-pack complaining about index-pack failing, because the main thread is racing with the sideband-demuxer. But they'll definitely see the signal death in the exit code, which is what the test is complaining about. We can make this more predictable by just ignoring SIGPIPE. The sideband demuxer uses write_or_die(), so it will notice and stop (gracefully, because we hook die_routine() to exit just the thread). And during this section we're not writing anywhere else where we'd be concerned about SIGPIPE preventing us from wasting effort writing to nowhere. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-19 20:58:55 +00:00
sigchain_pop(SIGPIPE);
/*
* Now that index-pack has succeeded, write the promisor file using the
* obtained .keep filename if necessary
*/
if (do_keep && pack_lockfiles && pack_lockfiles->nr && args->from_promisor)
create_promisor_file(pack_lockfiles->items[0].string, sought, nr_sought);
return 0;
}
static int cmp_ref_by_name(const void *a_, const void *b_)
{
const struct ref *a = *((const struct ref **)a_);
const struct ref *b = *((const struct ref **)b_);
return strcmp(a->name, b->name);
}
static struct ref *do_fetch_pack(struct fetch_pack_args *args,
int fd[2],
const struct ref *orig_ref,
struct ref **sought, int nr_sought,
struct shallow_info *si,
struct string_list *pack_lockfiles)
{
struct repository *r = the_repository;
struct ref *ref = copy_ref_list(orig_ref);
struct object_id oid;
const char *agent_feature;
int agent_len;
struct fetch_negotiator negotiator_alloc;
struct fetch_negotiator *negotiator;
negotiator = &negotiator_alloc;
fetch_negotiator_init(r, negotiator);
sort_ref_list(&ref, ref_compare_name);
QSORT(sought, nr_sought, cmp_ref_by_name);
if ((agent_feature = server_feature_value("agent", &agent_len))) {
agent_supported = 1;
if (agent_len)
print_verbose(args, _("Server version is %.*s"),
agent_len, agent_feature);
}
if (!server_supports("session-id"))
advertise_sid = 0;
if (server_supports("shallow"))
print_verbose(args, _("Server supports %s"), "shallow");
else if (args->depth > 0 || is_repository_shallow(r))
die(_("Server does not support shallow clients"));
if (args->depth > 0 || args->deepen_since || args->deepen_not)
args->deepen = 1;
if (server_supports("multi_ack_detailed")) {
print_verbose(args, _("Server supports %s"), "multi_ack_detailed");
multi_ack = 2;
if (server_supports("no-done")) {
print_verbose(args, _("Server supports %s"), "no-done");
if (args->stateless_rpc)
no_done = 1;
}
}
else if (server_supports("multi_ack")) {
print_verbose(args, _("Server supports %s"), "multi_ack");
multi_ack = 1;
}
if (server_supports("side-band-64k")) {
print_verbose(args, _("Server supports %s"), "side-band-64k");
use_sideband = 2;
}
else if (server_supports("side-band")) {
print_verbose(args, _("Server supports %s"), "side-band");
use_sideband = 1;
}
if (server_supports("allow-tip-sha1-in-want")) {
print_verbose(args, _("Server supports %s"), "allow-tip-sha1-in-want");
allow_unadvertised_object_request |= ALLOW_TIP_SHA1;
}
if (server_supports("allow-reachable-sha1-in-want")) {
print_verbose(args, _("Server supports %s"), "allow-reachable-sha1-in-want");
allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
}
if (server_supports("thin-pack"))
print_verbose(args, _("Server supports %s"), "thin-pack");
else
args->use_thin_pack = 0;
if (server_supports("no-progress"))
print_verbose(args, _("Server supports %s"), "no-progress");
else
args->no_progress = 0;
if (server_supports("include-tag"))
print_verbose(args, _("Server supports %s"), "include-tag");
else
args->include_tag = 0;
if (server_supports("ofs-delta"))
print_verbose(args, _("Server supports %s"), "ofs-delta");
else
prefer_ofs_delta = 0;
if (server_supports("filter")) {
server_supports_filtering = 1;
print_verbose(args, _("Server supports %s"), "filter");
} else if (args->filter_options.choice) {
warning("filtering not recognized by server, ignoring");
}
if (server_supports("deepen-since")) {
print_verbose(args, _("Server supports %s"), "deepen-since");
deepen_since_ok = 1;
} else if (args->deepen_since)
die(_("Server does not support --shallow-since"));
if (server_supports("deepen-not")) {
print_verbose(args, _("Server supports %s"), "deepen-not");
deepen_not_ok = 1;
} else if (args->deepen_not)
die(_("Server does not support --shallow-exclude"));
if (server_supports("deepen-relative"))
print_verbose(args, _("Server supports %s"), "deepen-relative");
else if (args->deepen_relative)
fetch, upload-pack: --deepen=N extends shallow boundary by N commits In git-fetch, --depth argument is always relative with the latest remote refs. This makes it a bit difficult to cover this use case, where the user wants to make the shallow history, say 3 levels deeper. It would work if remote refs have not moved yet, but nobody can guarantee that, especially when that use case is performed a couple months after the last clone or "git fetch --depth". Also, modifying shallow boundary using --depth does not work well with clones created by --since or --not. This patch fixes that. A new argument --deepen=<N> will add <N> more (*) parent commits to the current history regardless of where remote refs are. Have/Want negotiation is still respected. So if remote refs move, the server will send two chunks: one between "have" and "want" and another to extend shallow history. In theory, the client could send no "want"s in order to get the second chunk only. But the protocol does not allow that. Either you send no want lines, which means ls-remote; or you have to send at least one want line that carries deep-relative to the server.. The main work was done by Dongcan Jiang. I fixed it up here and there. And of course all the bugs belong to me. (*) We could even support --deepen=<N> where <N> is negative. In that case we can cut some history from the shallow clone. This operation (and --depth=<shorter depth>) does not require interaction with remote side (and more complicated to implement as a result). Helped-by: Duy Nguyen <pclouds@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-12 10:54:09 +00:00
die(_("Server does not support --deepen"));
if (!server_supports_hash(the_hash_algo->name, NULL))
die(_("Server does not support this repository's object format"));
mark_complete_and_common_ref(negotiator, args, &ref);
filter_refs(args, &ref, sought, nr_sought);
if (everything_local(args, &ref)) {
packet_flush(fd[1]);
goto all_done;
}
if (find_common(negotiator, args, fd, &oid, ref) < 0)
if (!args->keep_pack)
/* When cloning, it is not unusual to have
* no common commit.
*/
warning(_("no common commits"));
if (args->stateless_rpc)
packet_flush(fd[1]);
if (args->deepen)
setup_alternate_shallow(&shallow_lock, &alternate_shallow_file,
NULL);
else if (si->nr_ours || si->nr_theirs) {
if (args->reject_shallow_remote)
die(_("source repository is shallow, reject to clone."));
alternate_shallow_file = setup_temporary_shallow(si->shallow);
} else
alternate_shallow_file = NULL;
if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought,
&fsck_options.gitmodules_found))
die(_("git fetch-pack: fetch failed."));
if (fsck_finish(&fsck_options))
die("fsck failed");
all_done:
if (negotiator)
negotiator->release(negotiator);
return ref;
}
static void add_shallow_requests(struct strbuf *req_buf,
const struct fetch_pack_args *args)
{
if (is_repository_shallow(the_repository))
write_shallow_commits(req_buf, 1, NULL);
if (args->depth > 0)
packet_buf_write(req_buf, "deepen %d", args->depth);
if (args->deepen_since) {
timestamp_t max_age = approxidate(args->deepen_since);
packet_buf_write(req_buf, "deepen-since %"PRItime, max_age);
}
if (args->deepen_not) {
int i;
for (i = 0; i < args->deepen_not->nr; i++) {
struct string_list_item *s = args->deepen_not->items + i;
packet_buf_write(req_buf, "deepen-not %s", s->string);
}
}
if (args->deepen_relative)
packet_buf_write(req_buf, "deepen-relative\n");
}
static void add_wants(const struct ref *wants, struct strbuf *req_buf)
{
int use_ref_in_want = server_supports_feature("fetch", "ref-in-want", 0);
for ( ; wants ; wants = wants->next) {
const struct object_id *remote = &wants->old_oid;
struct object *o;
/*
* If that object is complete (i.e. it is an ancestor of a
* local ref), we tell them we have it but do not have to
* tell them about its ancestors, which they already know
* about.
*
* We use lookup_object here because we are only
* interested in the case we *know* the object is
* reachable and we have already scanned it.
*/
if (((o = lookup_object(the_repository, remote)) != NULL) &&
(o->flags & COMPLETE)) {
continue;
}
if (!use_ref_in_want || wants->exact_oid)
packet_buf_write(req_buf, "want %s\n", oid_to_hex(remote));
else
packet_buf_write(req_buf, "want-ref %s\n", wants->name);
}
}
static void add_common(struct strbuf *req_buf, struct oidset *common)
{
struct oidset_iter iter;
const struct object_id *oid;
oidset_iter_init(common, &iter);
while ((oid = oidset_iter_next(&iter))) {
packet_buf_write(req_buf, "have %s\n", oid_to_hex(oid));
}
}
static int add_haves(struct fetch_negotiator *negotiator,
struct strbuf *req_buf,
int *haves_to_send)
{
int haves_added = 0;
const struct object_id *oid;
while ((oid = negotiator->next(negotiator))) {
packet_buf_write(req_buf, "have %s\n", oid_to_hex(oid));
if (++haves_added >= *haves_to_send)
break;
}
/* Increase haves to send on next round */
*haves_to_send = next_flush(1, *haves_to_send);
return haves_added;
}
static void write_fetch_command_and_capabilities(struct strbuf *req_buf,
const struct string_list *server_options)
{
const char *hash_name;
if (server_supports_v2("fetch", 1))
packet_buf_write(req_buf, "command=fetch");
if (server_supports_v2("agent", 0))
packet_buf_write(req_buf, "agent=%s", git_user_agent_sanitized());
if (advertise_sid && server_supports_v2("session-id", 0))
packet_buf_write(req_buf, "session-id=%s", trace2_session_id());
if (server_options && server_options->nr &&
server_supports_v2("server-option", 1)) {
int i;
for (i = 0; i < server_options->nr; i++)
packet_buf_write(req_buf, "server-option=%s",
server_options->items[i].string);
}
if (server_feature_v2("object-format", &hash_name)) {
int hash_algo = hash_algo_by_name(hash_name);
if (hash_algo_by_ptr(the_hash_algo) != hash_algo)
die(_("mismatched algorithms: client %s; server %s"),
the_hash_algo->name, hash_name);
packet_buf_write(req_buf, "object-format=%s", the_hash_algo->name);
} else if (hash_algo_by_ptr(the_hash_algo) != GIT_HASH_SHA1) {
die(_("the server does not support algorithm '%s'"),
the_hash_algo->name);
}
packet_buf_delim(req_buf);
}
static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
struct fetch_pack_args *args,
const struct ref *wants, struct oidset *common,
int *haves_to_send, int *in_vain,
int sideband_all, int seen_ack)
{
int haves_added;
int done_sent = 0;
struct strbuf req_buf = STRBUF_INIT;
write_fetch_command_and_capabilities(&req_buf, args->server_options);
if (args->use_thin_pack)
packet_buf_write(&req_buf, "thin-pack");
if (args->no_progress)
packet_buf_write(&req_buf, "no-progress");
if (args->include_tag)
packet_buf_write(&req_buf, "include-tag");
if (prefer_ofs_delta)
packet_buf_write(&req_buf, "ofs-delta");
if (sideband_all)
packet_buf_write(&req_buf, "sideband-all");
/* Add shallow-info and deepen request */
if (server_supports_feature("fetch", "shallow", 0))
add_shallow_requests(&req_buf, args);
else if (is_repository_shallow(the_repository) || args->deepen)
die(_("Server does not support shallow requests"));
/* Add filter */
if (server_supports_feature("fetch", "filter", 0) &&
args->filter_options.choice) {
const char *spec =
expand_list_objects_filter_spec(&args->filter_options);
print_verbose(args, _("Server supports filter"));
packet_buf_write(&req_buf, "filter %s", spec);
} else if (args->filter_options.choice) {
warning("filtering not recognized by server, ignoring");
}
if (server_supports_feature("fetch", "packfile-uris", 0)) {
int i;
struct strbuf to_send = STRBUF_INIT;
for (i = 0; i < uri_protocols.nr; i++) {
const char *s = uri_protocols.items[i].string;
if (!strcmp(s, "https") || !strcmp(s, "http")) {
if (to_send.len)
strbuf_addch(&to_send, ',');
strbuf_addstr(&to_send, s);
}
}
if (to_send.len) {
packet_buf_write(&req_buf, "packfile-uris %s",
to_send.buf);
strbuf_release(&to_send);
}
}
/* add wants */
add_wants(wants, &req_buf);
/* Add all of the common commits we've found in previous rounds */
add_common(&req_buf, common);
haves_added = add_haves(negotiator, &req_buf, haves_to_send);
*in_vain += haves_added;
if (!haves_added || (seen_ack && *in_vain >= MAX_IN_VAIN)) {
/* Send Done */
packet_buf_write(&req_buf, "done\n");
done_sent = 1;
}
/* Send request */
packet_buf_flush(&req_buf);
if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
die_errno(_("unable to write request to remote"));
strbuf_release(&req_buf);
return done_sent;
}
/*
* Processes a section header in a server's response and checks if it matches
* `section`. If the value of `peek` is 1, the header line will be peeked (and
* not consumed); if 0, the line will be consumed and the function will die if
* the section header doesn't match what was expected.
*/
static int process_section_header(struct packet_reader *reader,
const char *section, int peek)
{
int ret;
if (packet_reader_peek(reader) != PACKET_READ_NORMAL)
die(_("error reading section header '%s'"), section);
ret = !strcmp(reader->line, section);
if (!peek) {
if (!ret)
die(_("expected '%s', received '%s'"),
section, reader->line);
packet_reader_read(reader);
}
return ret;
}
static int process_ack(struct fetch_negotiator *negotiator,
struct packet_reader *reader,
struct object_id *common_oid,
int *received_ready)
{
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
const char *arg;
if (!strcmp(reader->line, "NAK"))
continue;
if (skip_prefix(reader->line, "ACK ", &arg)) {
if (!get_oid_hex(arg, common_oid)) {
struct commit *commit;
commit = lookup_commit(the_repository, common_oid);
if (negotiator)
negotiator->ack(negotiator, commit);
}
return 1;
}
if (!strcmp(reader->line, "ready")) {
*received_ready = 1;
continue;
}
die(_("unexpected acknowledgment line: '%s'"), reader->line);
}
if (reader->status != PACKET_READ_FLUSH &&
reader->status != PACKET_READ_DELIM)
die(_("error processing acks: %d"), reader->status);
fetch-pack: be more precise in parsing v2 response Each section in a protocol v2 response is followed by either a DELIM packet (indicating more sections to follow) or a FLUSH packet (indicating none to follow). But when parsing the "acknowledgments" section, do_fetch_pack_v2() is liberal in accepting both, but determines whether to continue reading or not based solely on the contents of the "acknowledgments" section, not on whether DELIM or FLUSH was read. There is no issue with a protocol-compliant server, but can result in confusing error messages when communicating with a server that serves unexpected additional sections. Consider a server that sends "new-section" after "acknowledgments": - client writes request - client reads the "acknowledgments" section which contains no "ready", then DELIM - since there was no "ready", client needs to continue negotiation, and writes request - client reads "new-section", and reports to the end user "expected 'acknowledgments', received 'new-section'" For the person debugging the involved Git implementation(s), the error message is confusing in that "new-section" was not received in response to the latest request, but to the first one. One solution is to always continue reading after DELIM, but in this case, we can do better. We know from the protocol that "ready" means at least the packfile section is coming (hence, DELIM) and that no "ready" means that no sections are to follow (hence, FLUSH). So teach process_acks() to enforce this. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19 22:54:04 +00:00
/*
* If an "acknowledgments" section is sent, a packfile is sent if and
* only if "ready" was sent in this section. The other sections
* ("shallow-info" and "wanted-refs") are sent only if a packfile is
* sent. Therefore, a DELIM is expected if "ready" is sent, and a FLUSH
* otherwise.
*/
if (*received_ready && reader->status != PACKET_READ_DELIM)
fetch-pack: be more precise in parsing v2 response Each section in a protocol v2 response is followed by either a DELIM packet (indicating more sections to follow) or a FLUSH packet (indicating none to follow). But when parsing the "acknowledgments" section, do_fetch_pack_v2() is liberal in accepting both, but determines whether to continue reading or not based solely on the contents of the "acknowledgments" section, not on whether DELIM or FLUSH was read. There is no issue with a protocol-compliant server, but can result in confusing error messages when communicating with a server that serves unexpected additional sections. Consider a server that sends "new-section" after "acknowledgments": - client writes request - client reads the "acknowledgments" section which contains no "ready", then DELIM - since there was no "ready", client needs to continue negotiation, and writes request - client reads "new-section", and reports to the end user "expected 'acknowledgments', received 'new-section'" For the person debugging the involved Git implementation(s), the error message is confusing in that "new-section" was not received in response to the latest request, but to the first one. One solution is to always continue reading after DELIM, but in this case, we can do better. We know from the protocol that "ready" means at least the packfile section is coming (hence, DELIM) and that no "ready" means that no sections are to follow (hence, FLUSH). So teach process_acks() to enforce this. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19 22:54:04 +00:00
die(_("expected packfile to be sent after 'ready'"));
if (!*received_ready && reader->status != PACKET_READ_FLUSH)
fetch-pack: be more precise in parsing v2 response Each section in a protocol v2 response is followed by either a DELIM packet (indicating more sections to follow) or a FLUSH packet (indicating none to follow). But when parsing the "acknowledgments" section, do_fetch_pack_v2() is liberal in accepting both, but determines whether to continue reading or not based solely on the contents of the "acknowledgments" section, not on whether DELIM or FLUSH was read. There is no issue with a protocol-compliant server, but can result in confusing error messages when communicating with a server that serves unexpected additional sections. Consider a server that sends "new-section" after "acknowledgments": - client writes request - client reads the "acknowledgments" section which contains no "ready", then DELIM - since there was no "ready", client needs to continue negotiation, and writes request - client reads "new-section", and reports to the end user "expected 'acknowledgments', received 'new-section'" For the person debugging the involved Git implementation(s), the error message is confusing in that "new-section" was not received in response to the latest request, but to the first one. One solution is to always continue reading after DELIM, but in this case, we can do better. We know from the protocol that "ready" means at least the packfile section is coming (hence, DELIM) and that no "ready" means that no sections are to follow (hence, FLUSH). So teach process_acks() to enforce this. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19 22:54:04 +00:00
die(_("expected no other sections to be sent after no 'ready'"));
return 0;
}
static void receive_shallow_info(struct fetch_pack_args *args,
struct packet_reader *reader,
struct oid_array *shallows,
struct shallow_info *si)
{
int unshallow_received = 0;
process_section_header(reader, "shallow-info", 0);
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
const char *arg;
struct object_id oid;
if (skip_prefix(reader->line, "shallow ", &arg)) {
if (get_oid_hex(arg, &oid))
die(_("invalid shallow line: %s"), reader->line);
oid_array_append(shallows, &oid);
continue;
}
if (skip_prefix(reader->line, "unshallow ", &arg)) {
if (get_oid_hex(arg, &oid))
die(_("invalid unshallow line: %s"), reader->line);
if (!lookup_object(the_repository, &oid))
die(_("object not found: %s"), reader->line);
/* make sure that it is parsed as shallow */
if (!parse_object(the_repository, &oid))
die(_("error in object: %s"), reader->line);
if (unregister_shallow(&oid))
die(_("no shallow found: %s"), reader->line);
unshallow_received = 1;
continue;
}
die(_("expected shallow/unshallow, got %s"), reader->line);
}
if (reader->status != PACKET_READ_FLUSH &&
reader->status != PACKET_READ_DELIM)
die(_("error processing shallow info: %d"), reader->status);
if (args->deepen || unshallow_received) {
/*
* Treat these as shallow lines caused by our depth settings.
* In v0, these lines cannot cause refs to be rejected; do the
* same.
*/
int i;
for (i = 0; i < shallows->nr; i++)
register_shallow(the_repository, &shallows->oid[i]);
setup_alternate_shallow(&shallow_lock, &alternate_shallow_file,
NULL);
args->deepen = 1;
} else if (shallows->nr) {
/*
* Treat these as shallow lines caused by the remote being
* shallow. In v0, remote refs that reach these objects are
* rejected (unless --update-shallow is set); do the same.
*/
prepare_shallow_info(si, shallows);
if (si->nr_ours || si->nr_theirs) {
if (args->reject_shallow_remote)
die(_("source repository is shallow, reject to clone."));
alternate_shallow_file =
setup_temporary_shallow(si->shallow);
} else
alternate_shallow_file = NULL;
} else {
alternate_shallow_file = NULL;
}
}
static int cmp_name_ref(const void *name, const void *ref)
{
return strcmp(name, (*(struct ref **)ref)->name);
}
fetch-pack: unify ref in and out param When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 20:13:20 +00:00
static void receive_wanted_refs(struct packet_reader *reader,
struct ref **sought, int nr_sought)
{
process_section_header(reader, "wanted-refs", 0);
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
struct object_id oid;
const char *end;
struct ref **found;
if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' ')
die(_("expected wanted-ref, got '%s'"), reader->line);
found = bsearch(end, sought, nr_sought, sizeof(*sought),
cmp_name_ref);
if (!found)
die(_("unexpected wanted-ref: '%s'"), reader->line);
oidcpy(&(*found)->old_oid, &oid);
}
if (reader->status != PACKET_READ_DELIM)
die(_("error processing wanted refs: %d"), reader->status);
}
static void receive_packfile_uris(struct packet_reader *reader,
struct string_list *uris)
{
process_section_header(reader, "packfile-uris", 0);
while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
if (reader->pktlen < the_hash_algo->hexsz ||
reader->line[the_hash_algo->hexsz] != ' ')
die("expected '<hash> <uri>', got: %s\n", reader->line);
string_list_append(uris, reader->line);
}
if (reader->status != PACKET_READ_DELIM)
die("expected DELIM");
}
enum fetch_state {
FETCH_CHECK_LOCAL = 0,
FETCH_SEND_REQUEST,
FETCH_PROCESS_ACKS,
FETCH_GET_PACK,
FETCH_DONE,
};
fetch: teach independent negotiation (no packfile) Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04 21:16:01 +00:00
static void do_check_stateless_delimiter(int stateless_rpc,
stateless-connect: send response end packet Currently, remote-curl acts as a proxy and blindly forwards packets between an HTTP server and fetch-pack. In the case of a stateless RPC connection where the connection is terminated before the transaction is complete, remote-curl will blindly forward the packets before waiting on more input from fetch-pack. Meanwhile, fetch-pack will read the transaction and continue reading, expecting more input to continue the transaction. This results in a deadlock between the two processes. This can be seen in the following command which does not terminate: $ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... whereas the v1 version does terminate as expected: $ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... fatal: the remote end hung up unexpectedly Instead of blindly forwarding packets, make remote-curl insert a response end packet after proxying the responses from the remote server when using stateless_connect(). On the RPC client side, ensure that each response ends as described. A separate control packet is chosen because we need to be able to differentiate between what the remote server sends and remote-curl's control packets. By ensuring in the remote-curl code that a server cannot send response end packets, we prevent a malicious server from being able to perform a denial of service attack in which they spoof a response end packet and cause the described deadlock to happen. Reported-by: Force Charlie <charlieio@outlook.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-19 10:54:00 +00:00
struct packet_reader *reader)
{
fetch: teach independent negotiation (no packfile) Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04 21:16:01 +00:00
check_stateless_delimiter(stateless_rpc, reader,
stateless-connect: send response end packet Currently, remote-curl acts as a proxy and blindly forwards packets between an HTTP server and fetch-pack. In the case of a stateless RPC connection where the connection is terminated before the transaction is complete, remote-curl will blindly forward the packets before waiting on more input from fetch-pack. Meanwhile, fetch-pack will read the transaction and continue reading, expecting more input to continue the transaction. This results in a deadlock between the two processes. This can be seen in the following command which does not terminate: $ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... whereas the v1 version does terminate as expected: $ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... fatal: the remote end hung up unexpectedly Instead of blindly forwarding packets, make remote-curl insert a response end packet after proxying the responses from the remote server when using stateless_connect(). On the RPC client side, ensure that each response ends as described. A separate control packet is chosen because we need to be able to differentiate between what the remote server sends and remote-curl's control packets. By ensuring in the remote-curl code that a server cannot send response end packets, we prevent a malicious server from being able to perform a denial of service attack in which they spoof a response end packet and cause the described deadlock to happen. Reported-by: Force Charlie <charlieio@outlook.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-19 10:54:00 +00:00
_("git fetch-pack: expected response end packet"));
}
static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
int fd[2],
const struct ref *orig_ref,
struct ref **sought, int nr_sought,
struct oid_array *shallows,
struct shallow_info *si,
struct string_list *pack_lockfiles)
{
struct repository *r = the_repository;
struct ref *ref = copy_ref_list(orig_ref);
enum fetch_state state = FETCH_CHECK_LOCAL;
struct oidset common = OIDSET_INIT;
struct packet_reader reader;
int in_vain = 0, negotiation_started = 0;
int haves_to_send = INITIAL_FLUSH;
struct fetch_negotiator negotiator_alloc;
struct fetch_negotiator *negotiator;
int seen_ack = 0;
struct object_id common_oid;
int received_ready = 0;
struct string_list packfile_uris = STRING_LIST_INIT_DUP;
int i;
struct strvec index_pack_args = STRVEC_INIT;
negotiator = &negotiator_alloc;
fetch_negotiator_init(r, negotiator);
packet_reader_init(&reader, fd[0], NULL, 0,
PACKET_READ_CHOMP_NEWLINE |
PACKET_READ_DIE_ON_ERR_PACKET);
if (git_env_bool("GIT_TEST_SIDEBAND_ALL", 1) &&
server_supports_feature("fetch", "sideband-all", 0)) {
reader.use_sideband = 1;
reader.me = "fetch-pack";
}
while (state != FETCH_DONE) {
switch (state) {
case FETCH_CHECK_LOCAL:
sort_ref_list(&ref, ref_compare_name);
QSORT(sought, nr_sought, cmp_ref_by_name);
/* v2 supports these by default */
allow_unadvertised_object_request |= ALLOW_REACHABLE_SHA1;
use_sideband = 2;
if (args->depth > 0 || args->deepen_since || args->deepen_not)
args->deepen = 1;
/* Filter 'ref' by 'sought' and those that aren't local */
mark_complete_and_common_ref(negotiator, args, &ref);
filter_refs(args, &ref, sought, nr_sought);
if (everything_local(args, &ref))
state = FETCH_DONE;
else
state = FETCH_SEND_REQUEST;
mark_tips(negotiator, args->negotiation_tips);
for_each_cached_alternate(negotiator,
insert_one_alternate_object);
break;
case FETCH_SEND_REQUEST:
if (!negotiation_started) {
negotiation_started = 1;
trace2_region_enter("fetch-pack",
"negotiation_v2",
the_repository);
}
if (send_fetch_request(negotiator, fd[1], args, ref,
&common,
&haves_to_send, &in_vain,
reader.use_sideband,
seen_ack))
state = FETCH_GET_PACK;
else
state = FETCH_PROCESS_ACKS;
break;
case FETCH_PROCESS_ACKS:
/* Process ACKs/NAKs */
process_section_header(&reader, "acknowledgments", 0);
while (process_ack(negotiator, &reader, &common_oid,
&received_ready)) {
in_vain = 0;
seen_ack = 1;
oidset_insert(&common, &common_oid);
}
if (received_ready) {
stateless-connect: send response end packet Currently, remote-curl acts as a proxy and blindly forwards packets between an HTTP server and fetch-pack. In the case of a stateless RPC connection where the connection is terminated before the transaction is complete, remote-curl will blindly forward the packets before waiting on more input from fetch-pack. Meanwhile, fetch-pack will read the transaction and continue reading, expecting more input to continue the transaction. This results in a deadlock between the two processes. This can be seen in the following command which does not terminate: $ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... whereas the v1 version does terminate as expected: $ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... fatal: the remote end hung up unexpectedly Instead of blindly forwarding packets, make remote-curl insert a response end packet after proxying the responses from the remote server when using stateless_connect(). On the RPC client side, ensure that each response ends as described. A separate control packet is chosen because we need to be able to differentiate between what the remote server sends and remote-curl's control packets. By ensuring in the remote-curl code that a server cannot send response end packets, we prevent a malicious server from being able to perform a denial of service attack in which they spoof a response end packet and cause the described deadlock to happen. Reported-by: Force Charlie <charlieio@outlook.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-19 10:54:00 +00:00
/*
* Don't check for response delimiter; get_pack() will
* read the rest of this response.
*/
state = FETCH_GET_PACK;
} else {
fetch: teach independent negotiation (no packfile) Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04 21:16:01 +00:00
do_check_stateless_delimiter(args->stateless_rpc, &reader);
state = FETCH_SEND_REQUEST;
}
break;
case FETCH_GET_PACK:
trace2_region_leave("fetch-pack",
"negotiation_v2",
the_repository);
/* Check for shallow-info section */
if (process_section_header(&reader, "shallow-info", 1))
receive_shallow_info(args, &reader, shallows, si);
if (process_section_header(&reader, "wanted-refs", 1))
fetch-pack: unify ref in and out param When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 20:13:20 +00:00
receive_wanted_refs(&reader, sought, nr_sought);
/* get the pack(s) */
if (git_env_bool("GIT_TRACE_REDACT", 1))
reader.options |= PACKET_READ_REDACT_URI_PATH;
if (process_section_header(&reader, "packfile-uris", 1))
receive_packfile_uris(&reader, &packfile_uris);
/* We don't expect more URIs. Reset to avoid expensive URI check. */
reader.options &= ~PACKET_READ_REDACT_URI_PATH;
process_section_header(&reader, "packfile", 0);
fetch-pack: signal v2 server that we are done making requests When fetching with the v0 protocol over ssh (or a local upload-pack with pipes), the server closes the connection as soon as it is finished sending the pack. So even though the client may still be operating on the data via index-pack (e.g., resolving deltas, checking connectivity, etc), the server has released all resources. With the v2 protocol, however, the server considers the ssh session only as a transport, with individual requests coming over it. After sending the pack, it goes back to its main loop, waiting for another request to come from the client. As a result, the ssh session hangs around until the client process ends, which may be much later (because resolving deltas, etc, may consume a lot of CPU). This is bad for two reasons: - it's consuming resources on the server to leave open a connection that won't see any more use - if something bad happens to the ssh connection in the meantime (say, it gets killed by the network because it's idle, as happened in a real-world report), then ssh will exit non-zero, and we'll propagate the error up the stack. The server is correct here not to hang up after serving the pack. The v2 protocol's design is meant to allow multiple requests like this, and hanging up would be the wrong thing for a hypothetical client which was planning to make more requests (though in practice, the git.git client never would, and I doubt any other implementations would either). The right thing is instead for the client to signal to the server that it's not interested in making more requests. We can do that by closing the pipe descriptor we use to write to ssh. This will propagate to the server upload-pack as an EOF when it tries to read the next request (and then it will close its half, and the whole connection will go away). It's important to do this "half duplex" shutdown, because we have to do it _before_ we actually receive the pack. This is an artifact of the way fetch-pack and index-pack (or unpack-objects) interact. We hand the connection off to index-pack (really, a sideband demuxer which feeds it), and then wait until it returns. And it doesn't do that until it has resolved all of the deltas in the pack, even though it was done reading from the server long before. So just closing the connection fully after index-pack returns would be too late; we'd have held it open much longer than was necessary. And teaching index-pack to close the connection is awkward. It's not even seeing the whole conversation (the sideband demuxer is, but it doesn't actually know what's in the packets, or when the end comes). Note that this close() is happening deep within the transport code. It's possible that a caller would want to perform other operations over the same ssh transport after receiving the pack. But as of the current code, none of the callers do, and there haven't been discussions of any plans to change this. If we need to support that later, we can probably do so by passing down a flag for "you're the last request on the transport; it's OK to close" instead of the code just assuming that's true. The description above all discusses v2 ssh, so it's worth thinking about how this interacts with other protocols: - in v0 protocols, we could do the same half-duplex shutdown (it just goes into the v0 do_fetch_pack() instead). This does work, but since it doesn't have the same persistence problem in the first place, there's little reason to change it at this point. - local fetches against git-upload-pack on the same machine will behave the same as ssh (they are talking over two pipes, and see EOF on their input pipe) - fetches against git-daemon will run this same code, and close one of the descriptors. In practice, this won't do anything, since there our two descriptors are dups of each other, and not part of a half-duplex pair. The right thing would probably be to call shutdown(SHUT_WR) on it. I didn't bother with that here. It doesn't face the same error-code problem (since it's just a TCP connection), so it's really only an optimization problem. And git:// is not that widely used these days, and has less impact on server resources than an ssh termination. - v2 http doesn't suffer from this problem in the first place, as our pipes terminate at a local git-remote-https, which is passing data along as individual requests via curl. Probably curl is keeping the TCP/TLS connection open for more requests, and we might be able to tell it manually "hey, we are done making requests now". But I think that's much less important. It again doesn't suffer from the error-code problem, and HTTP keepalive is pretty well understood (importantly, the timeouts can be set low, because clients like curl know how to reconnect for subsequent requests if necessary). So it's probably not worth figuring out how to tell curl that we're done (though if we do, this patch is probably the first step anyway; fetch-pack closes the pipe back to remote-https, which would be the signal that it should tell curl we're done). The code is pretty straightforward. We close the pipe at the right moment, and set it to -1 to mark it as invalid. I modified the later cleanup code to avoid calling close(-1). That's not strictly necessary, since close(-1) is a noop, but hopefully makes things a bit more obvious to a reader. I suspect that trying to call more transport functions after the close() (e.g., calling transport_fetch_refs() again) would fail, as it's not smart enough to realize we need to re-open the ssh connection. But that's already true when v0 is in use. And no current callers want to do that (and again, the solution is probably a flag in the transport code to keep things open, which can be added later). There's no test here, as the situation it covers is inherently racy (the question is when upload-pack exits, compared to when index-pack finishes resolving deltas and exits). The rather gross shell snippet below does recreate the problematic situation; when run on a sufficiently-large repository (git.git works fine), it kills an "idle" upload-pack while the client is resolving deltas, leading to a failed clone. ( git clone --no-local --progress . foo.git 2>&1 echo >&2 "clone exit code=$?" ) | tr '\r' '\n' | while read line do case "$done,$line" in ,Resolving*) echo "hit resolving deltas; killing upload-pack" killall -9 git-upload-pack done=t ;; esac done Reported-by: Greg Pflaum <greg.pflaum@pnp-hcl.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19 16:11:05 +00:00
/*
* this is the final request we'll make of the server;
* do a half-duplex shutdown to indicate that they can
* hang up as soon as the pack is sent.
*/
close(fd[1]);
fd[1] = -1;
if (get_pack(args, fd, pack_lockfiles,
packfile_uris.nr ? &index_pack_args : NULL,
sought, nr_sought, &fsck_options.gitmodules_found))
die(_("git fetch-pack: fetch failed."));
fetch: teach independent negotiation (no packfile) Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04 21:16:01 +00:00
do_check_stateless_delimiter(args->stateless_rpc, &reader);
state = FETCH_DONE;
break;
case FETCH_DONE:
continue;
}
}
for (i = 0; i < packfile_uris.nr; i++) {
int j;
struct child_process cmd = CHILD_PROCESS_INIT;
char packname[GIT_MAX_HEXSZ + 1];
const char *uri = packfile_uris.items[i].string +
the_hash_algo->hexsz + 1;
strvec_push(&cmd.args, "http-fetch");
strvec_pushf(&cmd.args, "--packfile=%.*s",
(int) the_hash_algo->hexsz,
packfile_uris.items[i].string);
for (j = 0; j < index_pack_args.nr; j++)
strvec_pushf(&cmd.args, "--index-pack-arg=%s",
index_pack_args.v[j]);
strvec_push(&cmd.args, uri);
cmd.git_cmd = 1;
cmd.no_stdin = 1;
cmd.out = -1;
if (start_command(&cmd))
die("fetch-pack: unable to spawn http-fetch");
if (read_in_full(cmd.out, packname, 5) < 0 ||
memcmp(packname, "keep\t", 5))
die("fetch-pack: expected keep then TAB at start of http-fetch output");
if (read_in_full(cmd.out, packname,
the_hash_algo->hexsz + 1) < 0 ||
packname[the_hash_algo->hexsz] != '\n')
die("fetch-pack: expected hash then LF at end of http-fetch output");
packname[the_hash_algo->hexsz] = '\0';
parse_gitmodules_oids(cmd.out, &fsck_options.gitmodules_found);
close(cmd.out);
if (finish_command(&cmd))
die("fetch-pack: unable to finish http-fetch");
if (memcmp(packfile_uris.items[i].string, packname,
the_hash_algo->hexsz))
die("fetch-pack: pack downloaded from %s does not match expected hash %.*s",
uri, (int) the_hash_algo->hexsz,
packfile_uris.items[i].string);
string_list_append_nodup(pack_lockfiles,
xstrfmt("%s/pack/pack-%s.keep",
get_object_directory(),
packname));
}
string_list_clear(&packfile_uris, 0);
strvec_clear(&index_pack_args);
if (fsck_finish(&fsck_options))
die("fsck failed");
if (negotiator)
negotiator->release(negotiator);
oidset_clear(&common);
return ref;
}
fetch: implement fetch.fsck.* Implement support for fetch.fsck.* corresponding with the existing receive.fsck.*. This allows for pedantically cloning repositories with specific issues without turning off fetch.fsckObjects. One such repository is https://github.com/robbyrussell/oh-my-zsh.git which before this change will emit this error when cloned with fetch.fsckObjects: error: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes fatal: Error in object fatal: index-pack failed Now with fetch.fsck.zeroPaddedFilemode=warn we'll warn about that issue, but the clone will succeed: warning: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes warning: object a18c4d13c2a5fa2d4ecd5346c50e119b999b807d: zeroPaddedFilemode: contains zero-padded file modes warning: object 84df066176c8da3fd59b13731a86d90f4f1e5c9d: zeroPaddedFilemode: contains zero-padded file modes The motivation for this is to be able to turn on fetch.fsckObjects globally across a fleet of computers but still be able to manually clone various legacy repositories by either white-listing specific issues, or better yet whitelist specific objects. The use of --git-dir=* instead of -C in the tests could be considered somewhat archaic, but the tests I'm adding here are duplicating the corresponding receive.* tests with as few changes as possible. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-27 14:37:17 +00:00
static int fetch_pack_config_cb(const char *var, const char *value, void *cb)
{
if (strcmp(var, "fetch.fsck.skiplist") == 0) {
const char *path;
if (git_config_pathname(&path, var, value))
return 1;
strbuf_addf(&fsck_msg_types, "%cskiplist=%s",
fsck_msg_types.len ? ',' : '=', path);
free((char *)path);
return 0;
}
if (skip_prefix(var, "fetch.fsck.", &var)) {
if (is_valid_msg_type(var, value))
strbuf_addf(&fsck_msg_types, "%c%s=%s",
fsck_msg_types.len ? ',' : '=', var, value);
else
warning("Skipping unknown msg id '%s'", var);
return 0;
}
return git_default_config(var, value, cb);
}
static void fetch_pack_config(void)
{
git_config_get_int("fetch.unpacklimit", &fetch_unpack_limit);
git_config_get_int("transfer.unpacklimit", &transfer_unpack_limit);
git_config_get_bool("repack.usedeltabaseoffset", &prefer_ofs_delta);
git_config_get_bool("fetch.fsckobjects", &fetch_fsck_objects);
git_config_get_bool("transfer.fsckobjects", &transfer_fsck_objects);
git_config_get_bool("transfer.advertisesid", &advertise_sid);
if (!uri_protocols.nr) {
char *str;
if (!git_config_get_string("fetch.uriprotocols", &str) && str) {
string_list_split(&uri_protocols, str, ',', -1);
free(str);
}
}
fetch: implement fetch.fsck.* Implement support for fetch.fsck.* corresponding with the existing receive.fsck.*. This allows for pedantically cloning repositories with specific issues without turning off fetch.fsckObjects. One such repository is https://github.com/robbyrussell/oh-my-zsh.git which before this change will emit this error when cloned with fetch.fsckObjects: error: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes fatal: Error in object fatal: index-pack failed Now with fetch.fsck.zeroPaddedFilemode=warn we'll warn about that issue, but the clone will succeed: warning: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes warning: object a18c4d13c2a5fa2d4ecd5346c50e119b999b807d: zeroPaddedFilemode: contains zero-padded file modes warning: object 84df066176c8da3fd59b13731a86d90f4f1e5c9d: zeroPaddedFilemode: contains zero-padded file modes The motivation for this is to be able to turn on fetch.fsckObjects globally across a fleet of computers but still be able to manually clone various legacy repositories by either white-listing specific issues, or better yet whitelist specific objects. The use of --git-dir=* instead of -C in the tests could be considered somewhat archaic, but the tests I'm adding here are duplicating the corresponding receive.* tests with as few changes as possible. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-27 14:37:17 +00:00
git_config(fetch_pack_config_cb, NULL);
}
static void fetch_pack_setup(void)
{
static int did_setup;
if (did_setup)
return;
fetch_pack_config();
if (0 <= transfer_unpack_limit)
unpack_limit = transfer_unpack_limit;
else if (0 <= fetch_unpack_limit)
unpack_limit = fetch_unpack_limit;
did_setup = 1;
}
static int remove_duplicates_in_refs(struct ref **ref, int nr)
{
struct string_list names = STRING_LIST_INIT_NODUP;
int src, dst;
for (src = dst = 0; src < nr; src++) {
struct string_list_item *item;
item = string_list_insert(&names, ref[src]->name);
if (item->util)
continue; /* already have it */
item->util = ref[src];
if (src != dst)
ref[dst] = ref[src];
dst++;
}
for (src = dst; src < nr; src++)
ref[src] = NULL;
string_list_clear(&names, 0);
return dst;
}
static void update_shallow(struct fetch_pack_args *args,
fetch-pack: unify ref in and out param When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 20:13:20 +00:00
struct ref **sought, int nr_sought,
struct shallow_info *si)
{
struct oid_array ref = OID_ARRAY_INIT;
int *status;
int i;
if (args->deepen && alternate_shallow_file) {
if (*alternate_shallow_file == '\0') { /* --unshallow */
unlink_or_warn(git_path_shallow(the_repository));
shallow.c: use '{commit,rollback}_shallow_file' In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily, 2019-01-10), the author noted that 'is_repository_shallow' produces visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'. This is a problem for e.g., fetching with '--update-shallow' in a shallow repository with 'fetch.writeCommitGraph' enabled, since the update to '.git/shallow' will cause Git to think that the repository isn't shallow when it is, thereby circumventing the commit-graph compatibility check. This causes problems in shallow repositories with at least shallow refs that have at least one ancestor (since the client won't have those objects, and therefore can't take the reachability closure over commits when writing a commit-graph). Address this by introducing thin wrappers over 'commit_lock_file' and 'rollback_lock_file' for use specifically when the lock is held over '.git/shallow'. These wrappers (appropriately called 'commit_shallow_file' and 'rollback_shallow_file') call into their respective functions in 'lockfile.h', but additionally reset validity checks used by the shallow machinery. Replace each instance of 'commit_lock_file' and 'rollback_lock_file' with 'commit_shallow_file' and 'rollback_shallow_file' when the lock being held is over the '.git/shallow' file. As a result, 'prune_shallow' can now only be called once (since 'check_shallow_file_for_update' will die after calling 'reset_repository_shallow'). But, this is OK since we only call 'prune_shallow' at most once per process. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 00:25:45 +00:00
rollback_shallow_file(the_repository, &shallow_lock);
} else
shallow.c: use '{commit,rollback}_shallow_file' In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily, 2019-01-10), the author noted that 'is_repository_shallow' produces visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'. This is a problem for e.g., fetching with '--update-shallow' in a shallow repository with 'fetch.writeCommitGraph' enabled, since the update to '.git/shallow' will cause Git to think that the repository isn't shallow when it is, thereby circumventing the commit-graph compatibility check. This causes problems in shallow repositories with at least shallow refs that have at least one ancestor (since the client won't have those objects, and therefore can't take the reachability closure over commits when writing a commit-graph). Address this by introducing thin wrappers over 'commit_lock_file' and 'rollback_lock_file' for use specifically when the lock is held over '.git/shallow'. These wrappers (appropriately called 'commit_shallow_file' and 'rollback_shallow_file') call into their respective functions in 'lockfile.h', but additionally reset validity checks used by the shallow machinery. Replace each instance of 'commit_lock_file' and 'rollback_lock_file' with 'commit_shallow_file' and 'rollback_shallow_file' when the lock being held is over the '.git/shallow' file. As a result, 'prune_shallow' can now only be called once (since 'check_shallow_file_for_update' will die after calling 'reset_repository_shallow'). But, this is OK since we only call 'prune_shallow' at most once per process. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 00:25:45 +00:00
commit_shallow_file(the_repository, &shallow_lock);
alternate_shallow_file = NULL;
return;
}
if (!si->shallow || !si->shallow->nr)
return;
if (args->cloning) {
/*
* remote is shallow, but this is a clone, there are
* no objects in repo to worry about. Accept any
* shallow points that exist in the pack (iow in repo
* after get_pack() and reprepare_packed_git())
*/
struct oid_array extra = OID_ARRAY_INIT;
struct object_id *oid = si->shallow->oid;
for (i = 0; i < si->shallow->nr; i++)
if (has_object_file(&oid[i]))
oid_array_append(&extra, &oid[i]);
if (extra.nr) {
setup_alternate_shallow(&shallow_lock,
&alternate_shallow_file,
&extra);
shallow.c: use '{commit,rollback}_shallow_file' In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily, 2019-01-10), the author noted that 'is_repository_shallow' produces visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'. This is a problem for e.g., fetching with '--update-shallow' in a shallow repository with 'fetch.writeCommitGraph' enabled, since the update to '.git/shallow' will cause Git to think that the repository isn't shallow when it is, thereby circumventing the commit-graph compatibility check. This causes problems in shallow repositories with at least shallow refs that have at least one ancestor (since the client won't have those objects, and therefore can't take the reachability closure over commits when writing a commit-graph). Address this by introducing thin wrappers over 'commit_lock_file' and 'rollback_lock_file' for use specifically when the lock is held over '.git/shallow'. These wrappers (appropriately called 'commit_shallow_file' and 'rollback_shallow_file') call into their respective functions in 'lockfile.h', but additionally reset validity checks used by the shallow machinery. Replace each instance of 'commit_lock_file' and 'rollback_lock_file' with 'commit_shallow_file' and 'rollback_shallow_file' when the lock being held is over the '.git/shallow' file. As a result, 'prune_shallow' can now only be called once (since 'check_shallow_file_for_update' will die after calling 'reset_repository_shallow'). But, this is OK since we only call 'prune_shallow' at most once per process. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 00:25:45 +00:00
commit_shallow_file(the_repository, &shallow_lock);
alternate_shallow_file = NULL;
}
oid_array_clear(&extra);
return;
}
if (!si->nr_ours && !si->nr_theirs)
return;
remove_nonexistent_theirs_shallow(si);
if (!si->nr_ours && !si->nr_theirs)
return;
fetch-pack: unify ref in and out param When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 20:13:20 +00:00
for (i = 0; i < nr_sought; i++)
oid_array_append(&ref, &sought[i]->old_oid);
si->ref = &ref;
if (args->update_shallow) {
/*
* remote is also shallow, .git/shallow may be updated
* so all refs can be accepted. Make sure we only add
* shallow roots that are actually reachable from new
* refs.
*/
struct oid_array extra = OID_ARRAY_INIT;
struct object_id *oid = si->shallow->oid;
assign_shallow_commits_to_refs(si, NULL, NULL);
if (!si->nr_ours && !si->nr_theirs) {
oid_array_clear(&ref);
return;
}
for (i = 0; i < si->nr_ours; i++)
oid_array_append(&extra, &oid[si->ours[i]]);
for (i = 0; i < si->nr_theirs; i++)
oid_array_append(&extra, &oid[si->theirs[i]]);
setup_alternate_shallow(&shallow_lock,
&alternate_shallow_file,
&extra);
shallow.c: use '{commit,rollback}_shallow_file' In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily, 2019-01-10), the author noted that 'is_repository_shallow' produces visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'. This is a problem for e.g., fetching with '--update-shallow' in a shallow repository with 'fetch.writeCommitGraph' enabled, since the update to '.git/shallow' will cause Git to think that the repository isn't shallow when it is, thereby circumventing the commit-graph compatibility check. This causes problems in shallow repositories with at least shallow refs that have at least one ancestor (since the client won't have those objects, and therefore can't take the reachability closure over commits when writing a commit-graph). Address this by introducing thin wrappers over 'commit_lock_file' and 'rollback_lock_file' for use specifically when the lock is held over '.git/shallow'. These wrappers (appropriately called 'commit_shallow_file' and 'rollback_shallow_file') call into their respective functions in 'lockfile.h', but additionally reset validity checks used by the shallow machinery. Replace each instance of 'commit_lock_file' and 'rollback_lock_file' with 'commit_shallow_file' and 'rollback_shallow_file' when the lock being held is over the '.git/shallow' file. As a result, 'prune_shallow' can now only be called once (since 'check_shallow_file_for_update' will die after calling 'reset_repository_shallow'). But, this is OK since we only call 'prune_shallow' at most once per process. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 00:25:45 +00:00
commit_shallow_file(the_repository, &shallow_lock);
oid_array_clear(&extra);
oid_array_clear(&ref);
alternate_shallow_file = NULL;
return;
}
/*
* remote is also shallow, check what ref is safe to update
* without updating .git/shallow
*/
CALLOC_ARRAY(status, nr_sought);
assign_shallow_commits_to_refs(si, NULL, status);
if (si->nr_ours || si->nr_theirs) {
fetch-pack: unify ref in and out param When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 20:13:20 +00:00
for (i = 0; i < nr_sought; i++)
if (status[i])
fetch-pack: unify ref in and out param When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 20:13:20 +00:00
sought[i]->status = REF_STATUS_REJECT_SHALLOW;
}
free(status);
oid_array_clear(&ref);
}
connected: refactor iterator to return next object ID directly The object ID iterator used by the connectivity checks returns the next object ID via an out-parameter and then uses a return code to indicate whether an item was found. This is a bit roundabout: instead of a separate error code, we can just return the next object ID directly and use `NULL` pointers as indicator that the iterator got no items left. Furthermore, this avoids a copy of the object ID. Refactor the iterator and all its implementations to return object IDs directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs: Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s] Range (min … max): 29.934 s … 30.406 s 10 runs Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s] Range (min … max): 29.696 s … 29.996 s 10 runs Summary '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch' While this 1% speedup could be labelled as statistically insignificant, the speedup is consistent on my machine. Furthermore, this is an end to end test, so it is expected that the improvement in the connectivity check itself is more significant. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:09:50 +00:00
static const struct object_id *iterate_ref_map(void *cb_data)
fetch-pack: write shallow, then check connectivity When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 22:08:43 +00:00
{
struct ref **rm = cb_data;
struct ref *ref = *rm;
if (!ref)
connected: refactor iterator to return next object ID directly The object ID iterator used by the connectivity checks returns the next object ID via an out-parameter and then uses a return code to indicate whether an item was found. This is a bit roundabout: instead of a separate error code, we can just return the next object ID directly and use `NULL` pointers as indicator that the iterator got no items left. Furthermore, this avoids a copy of the object ID. Refactor the iterator and all its implementations to return object IDs directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs: Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s] Range (min … max): 29.934 s … 30.406 s 10 runs Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s] Range (min … max): 29.696 s … 29.996 s 10 runs Summary '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch' While this 1% speedup could be labelled as statistically insignificant, the speedup is consistent on my machine. Furthermore, this is an end to end test, so it is expected that the improvement in the connectivity check itself is more significant. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:09:50 +00:00
return NULL;
fetch-pack: write shallow, then check connectivity When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 22:08:43 +00:00
*rm = ref->next;
connected: refactor iterator to return next object ID directly The object ID iterator used by the connectivity checks returns the next object ID via an out-parameter and then uses a return code to indicate whether an item was found. This is a bit roundabout: instead of a separate error code, we can just return the next object ID directly and use `NULL` pointers as indicator that the iterator got no items left. Furthermore, this avoids a copy of the object ID. Refactor the iterator and all its implementations to return object IDs directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs: Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s] Range (min … max): 29.934 s … 30.406 s 10 runs Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s] Range (min … max): 29.696 s … 29.996 s 10 runs Summary '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch' While this 1% speedup could be labelled as statistically insignificant, the speedup is consistent on my machine. Furthermore, this is an end to end test, so it is expected that the improvement in the connectivity check itself is more significant. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:09:50 +00:00
return &ref->old_oid;
fetch-pack: write shallow, then check connectivity When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 22:08:43 +00:00
}
struct ref *fetch_pack(struct fetch_pack_args *args,
int fd[],
const struct ref *ref,
struct ref **sought, int nr_sought,
struct oid_array *shallow,
struct string_list *pack_lockfiles,
enum protocol_version version)
{
struct ref *ref_cpy;
struct shallow_info si;
struct oid_array shallows_scratch = OID_ARRAY_INIT;
fetch_pack_setup();
if (nr_sought)
nr_sought = remove_duplicates_in_refs(sought, nr_sought);
if (version != protocol_v2 && !ref) {
packet_flush(fd[1]);
die(_("no matching remote head"));
}
if (version == protocol_v2) {
if (shallow->nr)
BUG("Protocol V2 does not provide shallows at this point in the fetch");
memset(&si, 0, sizeof(si));
ref_cpy = do_fetch_pack_v2(args, fd, ref, sought, nr_sought,
&shallows_scratch, &si,
pack_lockfiles);
} else {
prepare_shallow_info(&si, shallow);
ref_cpy = do_fetch_pack(args, fd, ref, sought, nr_sought,
&si, pack_lockfiles);
}
reprepare_packed_git(the_repository);
fetch-pack: write shallow, then check connectivity When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 22:08:43 +00:00
if (!args->cloning && args->deepen) {
struct check_connected_options opt = CHECK_CONNECTED_INIT;
struct ref *iterator = ref_cpy;
opt.shallow_file = alternate_shallow_file;
if (args->deepen)
opt.is_deepening_fetch = 1;
if (check_connected(iterate_ref_map, &iterator, &opt)) {
error(_("remote did not send all necessary objects"));
free_refs(ref_cpy);
ref_cpy = NULL;
shallow.c: use '{commit,rollback}_shallow_file' In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily, 2019-01-10), the author noted that 'is_repository_shallow' produces visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'. This is a problem for e.g., fetching with '--update-shallow' in a shallow repository with 'fetch.writeCommitGraph' enabled, since the update to '.git/shallow' will cause Git to think that the repository isn't shallow when it is, thereby circumventing the commit-graph compatibility check. This causes problems in shallow repositories with at least shallow refs that have at least one ancestor (since the client won't have those objects, and therefore can't take the reachability closure over commits when writing a commit-graph). Address this by introducing thin wrappers over 'commit_lock_file' and 'rollback_lock_file' for use specifically when the lock is held over '.git/shallow'. These wrappers (appropriately called 'commit_shallow_file' and 'rollback_shallow_file') call into their respective functions in 'lockfile.h', but additionally reset validity checks used by the shallow machinery. Replace each instance of 'commit_lock_file' and 'rollback_lock_file' with 'commit_shallow_file' and 'rollback_shallow_file' when the lock being held is over the '.git/shallow' file. As a result, 'prune_shallow' can now only be called once (since 'check_shallow_file_for_update' will die after calling 'reset_repository_shallow'). But, this is OK since we only call 'prune_shallow' at most once per process. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 00:25:45 +00:00
rollback_shallow_file(the_repository, &shallow_lock);
fetch-pack: write shallow, then check connectivity When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 22:08:43 +00:00
goto cleanup;
}
args->connectivity_checked = 1;
}
fetch-pack: unify ref in and out param When a user fetches: - at least one up-to-date ref and at least one non-up-to-date ref, - using HTTP with protocol v0 (or something else that uses the fetch command of a remote helper) some refs might not be updated after the fetch. This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow info in output parameter", 2018-06-28) which allowed transports to report the refs that they have fetched in a new out-parameter "fetched_refs". If they do so, transport_fetch_refs() makes this information available to its caller. Users of "fetched_refs" rely on the following 3 properties: (1) it is the complete list of refs that was passed to transport_fetch_refs(), (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if relevant), and (3) it has updated OIDs if ref-in-want was used (introduced after 989b8c4452). In an effort to satisfy (1), whenever transport_fetch_refs() filters the refs sent to the transport, it re-adds the filtered refs to whatever the transport supplies before returning it to the user. However, the implementation in 989b8c4452 unconditionally re-adds the filtered refs without checking if the transport refrained from reporting anything in "fetched_refs" (which it is allowed to do), resulting in an incomplete list, no longer satisfying (1). An earlier effort to resolve this [1] solved the issue by readding the filtered refs only if the transport did not refrain from reporting in "fetched_refs", but after further discussion, it seems that the better solution is to revert the API change that introduced "fetched_refs". This API change was first suggested as part of a ref-in-want implementation that allowed for ref patterns and, thus, there could be drastic differences between the input refs and the refs actually fetched [2]; we eventually decided to only allow exact ref names, but this API change remained even though its necessity was decreased. Therefore, revert this API change by reverting commit 989b8c4452, and make receive_wanted_refs() update the OIDs in the sought array (like how update_shallow() updates shallow information in the sought array) instead. A test is also included to show that the user-visible bug discussed at the beginning of this commit message no longer exists. [1] https://public-inbox.org/git/20180801171806.GA122458@google.com/ [2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 20:13:20 +00:00
update_shallow(args, sought, nr_sought, &si);
fetch-pack: write shallow, then check connectivity When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 22:08:43 +00:00
cleanup:
clear_shallow_info(&si);
oid_array_clear(&shallows_scratch);
return ref_cpy;
}
fetch: teach independent negotiation (no packfile) Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04 21:16:01 +00:00
static int add_to_object_array(const struct object_id *oid, void *data)
{
struct object_array *a = data;
add_object_array(lookup_object(the_repository, oid), "", a);
return 0;
}
static void clear_common_flag(struct oidset *s)
{
struct oidset_iter iter;
const struct object_id *oid;
oidset_iter_init(s, &iter);
while ((oid = oidset_iter_next(&iter))) {
struct object *obj = lookup_object(the_repository, oid);
obj->flags &= ~COMMON;
}
}
void negotiate_using_fetch(const struct oid_array *negotiation_tips,
const struct string_list *server_options,
int stateless_rpc,
int fd[],
struct oidset *acked_commits)
{
struct fetch_negotiator negotiator;
struct packet_reader reader;
struct object_array nt_object_array = OBJECT_ARRAY_INIT;
struct strbuf req_buf = STRBUF_INIT;
int haves_to_send = INITIAL_FLUSH;
int in_vain = 0;
int seen_ack = 0;
int last_iteration = 0;
timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
fetch_negotiator_init(the_repository, &negotiator);
mark_tips(&negotiator, negotiation_tips);
packet_reader_init(&reader, fd[0], NULL, 0,
PACKET_READ_CHOMP_NEWLINE |
PACKET_READ_DIE_ON_ERR_PACKET);
oid_array_for_each((struct oid_array *) negotiation_tips,
add_to_object_array,
&nt_object_array);
while (!last_iteration) {
int haves_added;
struct object_id common_oid;
int received_ready = 0;
strbuf_reset(&req_buf);
write_fetch_command_and_capabilities(&req_buf, server_options);
packet_buf_write(&req_buf, "wait-for-done");
haves_added = add_haves(&negotiator, &req_buf, &haves_to_send);
in_vain += haves_added;
if (!haves_added || (seen_ack && in_vain >= MAX_IN_VAIN))
last_iteration = 1;
/* Send request */
packet_buf_flush(&req_buf);
if (write_in_full(fd[1], req_buf.buf, req_buf.len) < 0)
die_errno(_("unable to write request to remote"));
/* Process ACKs/NAKs */
process_section_header(&reader, "acknowledgments", 0);
while (process_ack(&negotiator, &reader, &common_oid,
&received_ready)) {
struct commit *commit = lookup_commit(the_repository,
&common_oid);
if (commit) {
timestamp_t generation;
parse_commit_or_die(commit);
commit->object.flags |= COMMON;
generation = commit_graph_generation(commit);
if (generation < min_generation)
min_generation = generation;
}
in_vain = 0;
seen_ack = 1;
oidset_insert(acked_commits, &common_oid);
}
if (received_ready)
die(_("unexpected 'ready' from remote"));
else
do_check_stateless_delimiter(stateless_rpc, &reader);
if (can_all_from_reach_with_flag(&nt_object_array, COMMON,
REACH_SCRATCH, 0,
min_generation))
last_iteration = 1;
}
clear_common_flag(acked_commits);
strbuf_release(&req_buf);
}
int report_unmatched_refs(struct ref **sought, int nr_sought)
{
int i, ret = 0;
for (i = 0; i < nr_sought; i++) {
if (!sought[i])
continue;
switch (sought[i]->match_status) {
case REF_MATCHED:
continue;
case REF_NOT_MATCHED:
error(_("no such remote ref %s"), sought[i]->name);
break;
case REF_UNADVERTISED_NOT_ALLOWED:
error(_("Server does not allow request for unadvertised object %s"),
sought[i]->name);
break;
}
ret = 1;
}
return ret;
}