From 078b895fefdca94995862a4cc8644198b00a89bf Mon Sep 17 00:00:00 2001 From: Ivan Todoroski Date: Mon, 2 Apr 2012 17:13:48 +0200 Subject: [PATCH 1/4] fetch-pack: new --stdin option to read refs from stdin If a remote repo has too many tags (or branches), cloning it over the smart HTTP transport can fail because remote-curl.c puts all the refs from the remote repo on the fetch-pack command line. This can make the command line longer than the global OS command line limit, causing fetch-pack to fail. This is especially a problem on Windows where the command line limit is orders of magnitude shorter than Linux. There are already real repos out there that msysGit cannot clone over smart HTTP due to this problem. Here is an easy way to trigger this problem: git init too-many-refs cd too-many-refs echo bla > bla.txt git add . git commit -m test sha=$(git rev-parse HEAD) tag=$(perl -e 'print "bla" x 30') for i in `seq 50000`; do echo $sha refs/tags/$tag-$i >> .git/packed-refs done Then share this repo over the smart HTTP protocol and try cloning it: $ git clone http://localhost/.../too-many-refs/.git Cloning into 'too-many-refs'... fatal: cannot exec 'fetch-pack': Argument list too long 50k tags is obviously an absurd number, but it is required to demonstrate the problem on Linux because it has a much more generous command line limit. On Windows the clone fails with as little as 500 tags in the above loop, which is getting uncomfortably close to the number of tags you might see in real long lived repos. This is not just theoretical, msysGit is already failing to clone our company repo due to this. It's a large repo converted from CVS, nearly 10 years of history. Four possible solutions were discussed on the Git mailing list (in no particular order): 1) Call fetch-pack multiple times with smaller batches of refs. This was dismissed as inefficient and inelegant. 2) Add option --refs-fd=$n to pass a an fd from where to read the refs. This was rejected because inheriting descriptors other than stdin/stdout/stderr through exec() is apparently problematic on Windows, plus it would require changes to the run-command API to open extra pipes. 3) Add option --refs-from=$tmpfile to pass the refs using a temp file. This was not favored because of the temp file requirement. 4) Add option --stdin to pass the refs on stdin, one per line. In the end this option was chosen as the most efficient and most desirable from scripting perspective. There was however a small complication when using stdin to pass refs to fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for communication with the remote server. If we are going to sneak refs on stdin line by line, it would have to be done very carefully in the presence of --stateless-rpc, because when reading refs line by line we might read ahead too much data into our buffer and eat some of the remote protocol data which is also coming on stdin. One way to solve this would be to refactor get_remote_heads() in fetch-pack.c to accept a residual buffer from our stdin line parsing above, but this function is used in several places so other callers would be burdened by this residual buffer interface even when most of them don't need it. In the end we settled on the following solution: If --stdin is specified without --stateless-rpc, fetch-pack would read the refs from stdin one per line, in a script friendly format. However if --stdin is specified together with --stateless-rpc, fetch-pack would read the refs from stdin in packetized format (pkt-line) with a flush packet terminating the list of refs. This way we can read the exact number of bytes that we need from stdin, and then get_remote_heads() can continue reading from the same fd without losing a single byte of remote protocol data. This way the --stdin option only loses generality and scriptability when used together with --stateless-rpc, which is not easily scriptable anyway because it also uses pkt-line when talking to the remote server. Signed-off-by: Ivan Todoroski Signed-off-by: Junio C Hamano --- Documentation/git-fetch-pack.txt | 10 ++++++++ builtin/fetch-pack.c | 42 +++++++++++++++++++++++++++++++- fetch-pack.h | 1 + 3 files changed, 52 insertions(+), 1 deletion(-) diff --git a/Documentation/git-fetch-pack.txt b/Documentation/git-fetch-pack.txt index ed1bdaacd1..474fa307a0 100644 --- a/Documentation/git-fetch-pack.txt +++ b/Documentation/git-fetch-pack.txt @@ -32,6 +32,16 @@ OPTIONS --all:: Fetch all remote refs. +--stdin:: + Take the list of refs from stdin, one per line. If there + are refs specified on the command line in addition to this + option, then the refs from stdin are processed after those + on the command line. ++ +If '--stateless-rpc' is specified together with this option then +the list of refs must be in packet format (pkt-line). Each ref must +be in a separate packet, and the list must end with a flush packet. + -q:: --quiet:: Pass '-q' flag to 'git unpack-objects'; this makes the diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c index 6207ecd298..a9b6077af3 100644 --- a/builtin/fetch-pack.c +++ b/builtin/fetch-pack.c @@ -23,7 +23,9 @@ static struct fetch_pack_args args = { }; static const char fetch_pack_usage[] = -"git fetch-pack [--all] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=] [--depth=] [--no-progress] [-v] [:] [...]"; +"git fetch-pack [--all] [--stdin] [--quiet|-q] [--keep|-k] [--thin] " +"[--include-tag] [--upload-pack=] [--depth=] " +"[--no-progress] [-v] [:] [...]"; #define COMPLETE (1U << 0) #define COMMON (1U << 1) @@ -941,6 +943,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) args.fetch_all = 1; continue; } + if (!strcmp("--stdin", arg)) { + args.stdin_refs = 1; + continue; + } if (!strcmp("-v", arg)) { args.verbose = 1; continue; @@ -972,6 +978,40 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix) if (!dest) usage(fetch_pack_usage); + if (args.stdin_refs) { + /* + * Copy refs from cmdline to new growable list, then + * append the refs from the standard input. + */ + int alloc_heads = nr_heads; + int size = nr_heads * sizeof(*heads); + heads = memcpy(xmalloc(size), heads, size); + if (args.stateless_rpc) { + /* in stateless RPC mode we use pkt-line to read + * from stdin, until we get a flush packet + */ + static char line[1000]; + for (;;) { + int n = packet_read_line(0, line, sizeof(line)); + if (!n) + break; + if (line[n-1] == '\n') + n--; + ALLOC_GROW(heads, nr_heads + 1, alloc_heads); + heads[nr_heads++] = xmemdupz(line, n); + } + } + else { + /* read from stdin one ref per line, until EOF */ + struct strbuf line = STRBUF_INIT; + while (strbuf_getline(&line, stdin, '\n') != EOF) { + ALLOC_GROW(heads, nr_heads + 1, alloc_heads); + heads[nr_heads++] = strbuf_detach(&line, NULL); + } + strbuf_release(&line); + } + } + if (args.stateless_rpc) { conn = NULL; fd[0] = 0; diff --git a/fetch-pack.h b/fetch-pack.h index 0608edae3f..7c2069c972 100644 --- a/fetch-pack.h +++ b/fetch-pack.h @@ -10,6 +10,7 @@ struct fetch_pack_args { lock_pack:1, use_thin_pack:1, fetch_all:1, + stdin_refs:1, verbose:1, no_progress:1, include_tag:1, From 8150749da19a3e17623fc40320454afe89e04ca4 Mon Sep 17 00:00:00 2001 From: Ivan Todoroski Date: Mon, 2 Apr 2012 17:14:44 +0200 Subject: [PATCH 2/4] remote-curl: send the refs to fetch-pack on stdin Now that we can throw an arbitrary number of refs at fetch-pack using its --stdin option, we use it in the remote-curl helper to bypass the OS command line length limit. Signed-off-by: Ivan Todoroski Signed-off-by: Junio C Hamano --- remote-curl.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 48c20b86f3..543f2be8b0 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -290,6 +290,7 @@ static void output_refs(struct ref *refs) struct rpc_state { const char *service_name; const char **argv; + struct strbuf *stdin_preamble; char *service_url; char *hdr_content_type; char *hdr_accept; @@ -535,6 +536,7 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads) { const char *svc = rpc->service_name; struct strbuf buf = STRBUF_INIT; + struct strbuf *preamble = rpc->stdin_preamble; struct child_process client; int err = 0; @@ -545,6 +547,8 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads) client.argv = rpc->argv; if (start_command(&client)) exit(1); + if (preamble) + write_or_die(client.in, preamble->buf, preamble->len); if (heads) write_or_die(client.in, heads->buf, heads->len); @@ -626,13 +630,14 @@ static int fetch_git(struct discovery *heads, int nr_heads, struct ref **to_fetch) { struct rpc_state rpc; + struct strbuf preamble = STRBUF_INIT; char *depth_arg = NULL; - const char **argv; int argc = 0, i, err; + const char *argv[15]; - argv = xmalloc((15 + nr_heads) * sizeof(char*)); argv[argc++] = "fetch-pack"; argv[argc++] = "--stateless-rpc"; + argv[argc++] = "--stdin"; argv[argc++] = "--lock-pack"; if (options.followtags) argv[argc++] = "--include-tag"; @@ -651,24 +656,27 @@ static int fetch_git(struct discovery *heads, argv[argc++] = depth_arg; } argv[argc++] = url; + argv[argc++] = NULL; + for (i = 0; i < nr_heads; i++) { struct ref *ref = to_fetch[i]; if (!ref->name || !*ref->name) die("cannot fetch by sha1 over smart http"); - argv[argc++] = ref->name; + packet_buf_write(&preamble, "%s\n", ref->name); } - argv[argc++] = NULL; + packet_buf_flush(&preamble); memset(&rpc, 0, sizeof(rpc)); rpc.service_name = "git-upload-pack", rpc.argv = argv; + rpc.stdin_preamble = &preamble; rpc.gzip_request = 1; err = rpc_service(&rpc, heads); if (rpc.result.len) safe_write(1, rpc.result.buf, rpc.result.len); strbuf_release(&rpc.result); - free(argv); + strbuf_release(&preamble); free(depth_arg); return err; } From b2a9f4da645365d20ffac85cf2a6b0898258fcdc Mon Sep 17 00:00:00 2001 From: Ivan Todoroski Date: Mon, 2 Apr 2012 17:16:24 +0200 Subject: [PATCH 3/4] fetch-pack: test cases for the new --stdin option These test cases focus only on testing the parsing of refs on stdin, without bothering with the rest of the fetch-pack machinery. We pass in the refs using different combinations of command line and stdin and then we watch fetch-pack's stdout to see whether it prints all the refs we specified (but we ignore their order). Signed-off-by: Ivan Todoroski Signed-off-by: Junio C Hamano --- t/t5500-fetch-pack.sh | 66 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh index 9bf69e9a0f..1c56d9b883 100755 --- a/t/t5500-fetch-pack.sh +++ b/t/t5500-fetch-pack.sh @@ -248,4 +248,70 @@ test_expect_success 'clone shallow object count' ' grep "^count: 52" count.shallow ' +test_expect_success 'setup tests for the --stdin parameter' ' + for head in C D E F + do + add $head + done && + for head in A B C D E F + do + git tag $head $head + done && + cat >input <<-\EOF + refs/heads/C + refs/heads/A + refs/heads/D + refs/tags/C + refs/heads/B + refs/tags/A + refs/heads/E + refs/tags/B + refs/tags/E + refs/tags/D + EOF + sort expect && + ( + echo refs/heads/E && + echo refs/tags/E && + cat input + ) >input.dup +' + +test_expect_success 'fetch refs from cmdline' ' + ( + cd client && + git fetch-pack --no-progress .. $(cat ../input) + ) >output && + cut -d " " -f 2 actual && + test_cmp expect actual +' + +test_expect_success 'fetch refs from stdin' ' + ( + cd client && + git fetch-pack --stdin --no-progress .. <../input + ) >output && + cut -d " " -f 2 actual && + test_cmp expect actual +' + +test_expect_success 'fetch mixed refs from cmdline and stdin' ' + ( + cd client && + tail -n +5 ../input | + git fetch-pack --stdin --no-progress .. $(head -n 4 ../input) + ) >output && + cut -d " " -f 2 actual && + test_cmp expect actual +' + +test_expect_success 'test duplicate refs from stdin' ' + ( + cd client && + test_must_fail git fetch-pack --stdin --no-progress .. <../input.dup + ) >output && + cut -d " " -f 2 actual && + test_cmp expect actual +' + test_done From 7103d2543ad138bd9df672cdc94cc070dd7417d8 Mon Sep 17 00:00:00 2001 From: Ivan Todoroski Date: Mon, 2 Apr 2012 17:17:03 +0200 Subject: [PATCH 4/4] remote-curl: main test case for the OS command line overflow This is main test case for the original problem that triggered this patch series. We create a repo with 50k tags and then test whether git-clone over the smart HTTP protocol succeeds. Note that we construct the repo in a slightly different way than the original script used to reproduce the problem. This is because the original script just created 50k tags all pointing to the same commit, so if there was a bug where remote-curl.c was not passing all the refs to fetch-pack we wouldn't know. The clone would succeed even if only one tag was passed, because all the other tags were pointing at the same SHA and would be considered present. Instead we create a repo with 50k independent (dangling) commits and then tag each of those commits with a unique tag. This way if one of the tags is not given to fetch-pack, later stages of the clone would complain about it. This allows us to test both that the command line overflow was fixed, as well as that it was fixed in a way that doesn't leave out any of the refs. Signed-off-by: Ivan Todoroski Signed-off-by: Junio C Hamano --- t/t5551-http-fetch.sh | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/t/t5551-http-fetch.sh b/t/t5551-http-fetch.sh index 26d355725f..be6094be77 100755 --- a/t/t5551-http-fetch.sh +++ b/t/t5551-http-fetch.sh @@ -109,5 +109,36 @@ test_expect_success 'follow redirects (302)' ' git clone $HTTPD_URL/smart-redir-temp/repo.git --quiet repo-t ' +test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE + +test_expect_success EXPENSIVE 'create 50,000 tags in the repo' ' + ( + cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" && + for i in `seq 50000` + do + echo "commit refs/heads/too-many-refs" + echo "mark :$i" + echo "committer git $i +0000" + echo "data 0" + echo "M 644 inline bla.txt" + echo "data 4" + echo "bla" + # make every commit dangling by always + # rewinding the branch after each commit + echo "reset refs/heads/too-many-refs" + echo "from :1" + done | git fast-import --export-marks=marks && + + # now assign tags to all the dangling commits we created above + tag=$(perl -e "print \"bla\" x 30") && + sed -e "s/^:\(.\+\) \(.\+\)$/\2 refs\/tags\/$tag-\1/" >packed-refs + ) +' + +test_expect_success EXPENSIVE 'clone the 50,000 tag repo to check OS command line overflow' ' + git clone $HTTPD_URL/smart/repo.git too-many-refs 2>err && + test_line_count = 0 err +' + stop_httpd test_done