2018-03-14 18:31:41 +00:00
|
|
|
#include "builtin.h"
|
2018-05-08 06:59:15 +00:00
|
|
|
#include "exec-cmd.h"
|
2023-03-21 06:25:54 +00:00
|
|
|
#include "gettext.h"
|
2018-03-14 18:31:41 +00:00
|
|
|
#include "pkt-line.h"
|
|
|
|
#include "parse-options.h"
|
2023-05-16 06:33:59 +00:00
|
|
|
#include "path.h"
|
2018-03-14 18:31:41 +00:00
|
|
|
#include "protocol.h"
|
2023-02-24 00:09:33 +00:00
|
|
|
#include "replace-object.h"
|
2018-03-14 18:31:41 +00:00
|
|
|
#include "upload-pack.h"
|
2018-03-15 17:31:21 +00:00
|
|
|
#include "serve.h"
|
upload-pack: always turn off save_commit_buffer
When the client sends us "want $oid" lines, we call parse_object($oid)
to get an object struct. It's important to parse the commits because we
need to traverse them in the negotiation phase. But of course we don't
need to hold on to the commit messages for each one.
We've turned off the save_commit_buffer flag in get_common_commits() for
a long time, since f0243f26f6 (git-upload-pack: More efficient usage of
the has_sha1 array, 2005-10-28). That helps with the commits we see
while actually traversing. But:
1. That function is only used by the v0 protocol. I think the v2
protocol's code path leaves the flag on (and thus pays the extra
memory penalty), though I didn't measure it specifically.
2. If the client sends us a bunch of "want" lines, that happens before
the negotiation phase. So we'll hold on to all of those commit
messages. Generally the number of "want" lines scales with the
refs, not with the number of objects in the repo. But a malicious
client could send a lot in order to waste memory.
As an example of (2), if I generate a request to fetch all commits in
git.git like this:
pktline() {
local msg="$*"
printf "%04x%s\n" $((1+4+${#msg})) "$msg"
}
want_commits() {
pktline command=fetch
printf 0001
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
while read oid type; do
test "$type" = "commit" || continue
pktline want $oid
done
pktline done
printf 0000
}
want_commits | GIT_PROTOCOL=version=2 valgrind --tool=massif git-upload-pack . >/dev/null
before this patch upload-pack peaks at ~125MB, and after at ~35MB. The
difference is not coincidentally about the same as the sum of all commit
object sizes as computed by:
git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' |
perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }'
In a larger repository like linux.git, that number is ~1GB.
In a repository with a full commit-graph file this will have no impact
(and the commit graph would save us from parsing at all, so is a much
better solution!). But it's easy to do, might help a little in
real-world cases (where even if you have a commit graph it might not be
fully up to date), and helps a lot for a worst-case malicious request.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 22:39:00 +00:00
|
|
|
#include "commit.h"
|
2024-04-24 07:11:55 +00:00
|
|
|
#include "environment.h"
|
2018-03-14 18:31:41 +00:00
|
|
|
|
|
|
|
static const char * const upload_pack_usage[] = {
|
2022-10-13 15:39:13 +00:00
|
|
|
N_("git-upload-pack [--[no-]strict] [--timeout=<n>] [--stateless-rpc]\n"
|
|
|
|
" [--advertise-refs] <directory>"),
|
2018-03-14 18:31:41 +00:00
|
|
|
NULL
|
|
|
|
};
|
|
|
|
|
|
|
|
int cmd_upload_pack(int argc, const char **argv, const char *prefix)
|
|
|
|
{
|
|
|
|
const char *dir;
|
|
|
|
int strict = 0;
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 01:25:42 +00:00
|
|
|
int advertise_refs = 0;
|
|
|
|
int stateless_rpc = 0;
|
|
|
|
int timeout = 0;
|
2018-03-14 18:31:41 +00:00
|
|
|
struct option options[] = {
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 01:25:42 +00:00
|
|
|
OPT_BOOL(0, "stateless-rpc", &stateless_rpc,
|
2018-03-14 18:31:41 +00:00
|
|
|
N_("quit after a single request/response exchange")),
|
2021-08-05 01:25:43 +00:00
|
|
|
OPT_HIDDEN_BOOL(0, "http-backend-info-refs", &advertise_refs,
|
|
|
|
N_("serve up the info/refs for git-http-backend")),
|
|
|
|
OPT_ALIAS(0, "advertise-refs", "http-backend-info-refs"),
|
2018-03-14 18:31:41 +00:00
|
|
|
OPT_BOOL(0, "strict", &strict,
|
|
|
|
N_("do not try <directory>/.git/ if <directory> is no Git directory")),
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 01:25:42 +00:00
|
|
|
OPT_INTEGER(0, "timeout", &timeout,
|
2018-03-14 18:31:41 +00:00
|
|
|
N_("interrupt transfer after <n> seconds of inactivity")),
|
|
|
|
OPT_END()
|
|
|
|
};
|
|
|
|
|
|
|
|
packet_trace_identity("upload-pack");
|
2023-06-06 13:24:35 +00:00
|
|
|
disable_replace_refs();
|
upload-pack: always turn off save_commit_buffer
When the client sends us "want $oid" lines, we call parse_object($oid)
to get an object struct. It's important to parse the commits because we
need to traverse them in the negotiation phase. But of course we don't
need to hold on to the commit messages for each one.
We've turned off the save_commit_buffer flag in get_common_commits() for
a long time, since f0243f26f6 (git-upload-pack: More efficient usage of
the has_sha1 array, 2005-10-28). That helps with the commits we see
while actually traversing. But:
1. That function is only used by the v0 protocol. I think the v2
protocol's code path leaves the flag on (and thus pays the extra
memory penalty), though I didn't measure it specifically.
2. If the client sends us a bunch of "want" lines, that happens before
the negotiation phase. So we'll hold on to all of those commit
messages. Generally the number of "want" lines scales with the
refs, not with the number of objects in the repo. But a malicious
client could send a lot in order to waste memory.
As an example of (2), if I generate a request to fetch all commits in
git.git like this:
pktline() {
local msg="$*"
printf "%04x%s\n" $((1+4+${#msg})) "$msg"
}
want_commits() {
pktline command=fetch
printf 0001
git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
while read oid type; do
test "$type" = "commit" || continue
pktline want $oid
done
pktline done
printf 0000
}
want_commits | GIT_PROTOCOL=version=2 valgrind --tool=massif git-upload-pack . >/dev/null
before this patch upload-pack peaks at ~125MB, and after at ~35MB. The
difference is not coincidentally about the same as the sum of all commit
object sizes as computed by:
git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' |
perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }'
In a larger repository like linux.git, that number is ~1GB.
In a repository with a full commit-graph file this will have no impact
(and the commit graph would save us from parsing at all, so is a much
better solution!). But it's easy to do, might help a little in
real-world cases (where even if you have a commit graph it might not be
fully up to date), and helps a lot for a worst-case malicious request.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 22:39:00 +00:00
|
|
|
save_commit_buffer = 0;
|
2024-04-24 07:11:55 +00:00
|
|
|
xsetenv(NO_LAZY_FETCH_ENVIRONMENT, "1", 0);
|
2018-03-14 18:31:41 +00:00
|
|
|
|
2019-05-09 21:28:51 +00:00
|
|
|
argc = parse_options(argc, argv, prefix, options, upload_pack_usage, 0);
|
2018-03-14 18:31:41 +00:00
|
|
|
|
|
|
|
if (argc != 1)
|
|
|
|
usage_with_options(upload_pack_usage, options);
|
|
|
|
|
|
|
|
setup_path();
|
|
|
|
|
|
|
|
dir = argv[0];
|
|
|
|
|
|
|
|
if (!enter_repo(dir, strict))
|
|
|
|
die("'%s' does not appear to be a git repository", dir);
|
|
|
|
|
|
|
|
switch (determine_protocol_version_server()) {
|
2018-03-14 18:31:47 +00:00
|
|
|
case protocol_v2:
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 01:25:42 +00:00
|
|
|
if (advertise_refs)
|
|
|
|
protocol_v2_advertise_capabilities();
|
|
|
|
else
|
|
|
|
protocol_v2_serve_loop(stateless_rpc);
|
2018-03-14 18:31:47 +00:00
|
|
|
break;
|
2018-03-14 18:31:41 +00:00
|
|
|
case protocol_v1:
|
|
|
|
/*
|
|
|
|
* v1 is just the original protocol with a version string,
|
|
|
|
* so just fall through after writing the version string.
|
|
|
|
*/
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 01:25:42 +00:00
|
|
|
if (advertise_refs || !stateless_rpc)
|
2018-03-14 18:31:41 +00:00
|
|
|
packet_write_fmt(1, "version 1\n");
|
|
|
|
|
|
|
|
/* fallthrough */
|
|
|
|
case protocol_v0:
|
serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d3 (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e3 (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).
Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).
Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.
Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.
Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.
1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 01:25:42 +00:00
|
|
|
upload_pack(advertise_refs, stateless_rpc, timeout);
|
2018-03-14 18:31:41 +00:00
|
|
|
break;
|
|
|
|
case protocol_unknown_version:
|
|
|
|
BUG("unknown protocol version");
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|