Commit graph

165 commits

Author SHA1 Message Date
Junio C Hamano
403ac1381c Merge branch 'jk/send-pack-check-negative-with-quick'
Performance tweak on "git push" into a repository with many refs
that point at objects we have never heard of.

* jk/send-pack-check-negative-with-quick:
  send-pack: use OBJECT_INFO_QUICK to check negative objects
2019-12-06 15:09:22 -08:00
Junio C Hamano
3ae8defaf9 Merge branch 'jk/send-pack-remote-failure'
Error handling after "git push" finishes sending the packdata and
waits for the response to the remote side has been improved.

* jk/send-pack-remote-failure:
  send-pack: check remote ref status on pack-objects failure
2019-12-01 09:04:40 -08:00
Jeff King
5cf7a17dfb send-pack: use OBJECT_INFO_QUICK to check negative objects
When pushing, we feed pack-objects a list of both positive and negative
objects. The positive objects are what we want to send, and the negative
objects are what the other side told us they have, which we can use to
limit the size of the push.

Before passing along a negative object, send_pack() will make sure we
actually have it (since we only know about it because the remote
mentioned it, not because it's one of our refs). So it's expected that
some of these objects will be missing on the local side. But looking for
a missing object is more expensive than one that we have: it triggers
reprepare_packed_git() to handle a racy repack, plus it has to explore
every alternate's loose object tree (which can be slow if you have a lot
of them, or have a high-latency filesystem).

This isn't usually a big problem, since repositories you're pushing to
don't generally have a large number of refs that are unrelated to what
the client has. But there's no reason such a setup is wrong, and it
currently performs poorly.

We can fix this by using OBJECT_INFO_QUICK, which tells the lookup
code that we expect objects to be missing. Notably, it will not re-scan
the packs, and it will use the loose cache from 61c7711cfe (sha1-file:
use loose object cache for quick existence check, 2018-11-12).

The downside is that in the rare case that we race with a local repack,
we might fail to feed some objects to pack-objects, making the resulting
push larger. But we'd never produce an invalid or incorrect push, just a
less optimal one. That seems like a reasonable tradeoff, and we already
do similar things on the fetch side (e.g., when marking COMPLETE
commits).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-30 09:10:39 -08:00
Jeff King
ad7a403268 send-pack: check remote ref status on pack-objects failure
When we're pushing a pack and our local pack-objects fails, we enter an
error code path that does a few things:

  1. Set the status of every ref to REF_STATUS_NONE

  2. Call receive_unpack_status() to try to get an error report from
     the other side

  3. Return an error to the caller

If pack-objects failed because the connection to the server dropped,
there's not much more we can do than report the hangup. And indeed, step
2 will try to read a packet from the other side, which will die() in the
packet-reading code with "the remote end hung up unexpectedly".

But if the connection _didn't_ die, then the most common issue is that
the remote index-pack or unpack-objects complained about our pack (we
could also have a local pack-objects error, but this ends up being the
same; we'd send an incomplete pack and the remote side would complain).

In that case we do report the error from the other side (because of step
2), but we fail to say anything further about the refs. The issue is
two-fold:

  - in step 1, the "NONE" status is not "we saw an error, so we have no
    status". It generally means "this ref did not match our refspecs, so
    we didn't try to push it". So when we print out the push status
    table, we won't mention any refs at all!

    But even if we had a status enum for "pack-objects error", we
    wouldn't want to blindly set it for every ref. For example, in a
    non-atomic push we might have rejected some refs already on the
    client side (e.g., REF_STATUS_REJECT_NODELETE) and we'd want to
    report that.

  - in step 2, we read just the unpack status. But receive-pack will
    also tell us about each ref (usually that it rejected them because
    of the unpacker error).

So a much better strategy is to leave the ref status fields as they are
(usually EXPECTING_REPORT) and then actually receive (and print) the
full per-ref status.

This case is actually covered in the test suite, as t5504.8, which
writes a pack that will be rejected by the remote unpack-objects. But
it's racy. Because our pack is small, most of the time pack-objects
manages to write the whole thing before the remote rejects it, and so it
returns success and we print out the errors from the remote. But very
occasionally (or when run under --stress) it goes slow enough to see a
failure in writing, and git-push reports nothing for the refs.

With this patch, the test should behave consistently.

There shouldn't be any downside to this approach. If we really did see
the connection drop, we'd already die in receive_unpack_status(), and
we'll continue to do so. If the connection drops _after_ we get the
unpack status but before we see any ref status, we'll still print the
remote unpacker error, but will now say "remote end hung up" instead of
returning the error up the call-stack. But as discussed, we weren't
showing anything more useful than that with the current code. And
anyway, that case is quite unlikely (the connection dropping at that
point would have to be unrelated to the pack-objects error, because of
the ordering of events).

In the future we might want to handle packet-read errors ourself instead
of dying, which would print a full ref status table even for hangups.
But in the meantime, this patch should be a strict improvement.

Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-13 16:13:03 +09:00
Jonathan Tan
d8bc1a518a send-pack: never fetch when checking exclusions
When building the packfile to be sent, send_pack() is given a list of
remote refs to be used as exclusions. For each ref, it first checks if
the ref exists locally, and if it does, passes it with a "^" prefix to
pack-objects. However, in a partial clone, the check may trigger a lazy
fetch.

The additional commit ancestry information obtained during such fetches
may show that certain objects that would have been sent are already
known to the server, resulting in a smaller pack being sent. But this is
at the cost of fetching from many possibly unrelated refs, and the lazy
fetches do not help at all in the typical case where the client is
up-to-date with the upstream of the branch being pushed.

Ensure that these lazy fetches do not occur.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-09 10:46:41 +09:00
Junio C Hamano
cba595ab1a Merge branch 'jk/loose-object-cache-oid'
Code clean-up.

* jk/loose-object-cache-oid:
  prefer "hash mismatch" to "sha1 mismatch"
  sha1-file: avoid "sha1 file" for generic use in messages
  sha1-file: prefer "loose object file" to "sha1 file" in messages
  sha1-file: drop has_sha1_file()
  convert has_sha1_file() callers to has_object_file()
  sha1-file: convert pass-through functions to object_id
  sha1-file: modernize loose header/stream functions
  sha1-file: modernize loose object file functions
  http: use struct object_id instead of bare sha1
  update comment references to sha1_object_info()
  sha1-file: fix outdated sha1 comment references
2019-02-06 22:05:27 -08:00
Jeff King
98374a07c9 convert has_sha1_file() callers to has_object_file()
The only remaining callers of has_sha1_file() actually have an object_id
already. They can use the "object" variant, rather than dereferencing
the hash themselves.

The code changes here were completely generated by the included
coccinelle patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08 09:41:06 -08:00
Masaya Suzuki
2d103c31c2 pack-protocol.txt: accept error packets in any context
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.

Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.

Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 13:05:30 -08:00
Masaya Suzuki
01f9ec64c8 Use packet_reader instead of packet_read_line
By using and sharing a packet_reader while handling a Git pack protocol
request, the same reader option is used throughout the code. This makes
it easy to set a reader option to the request parsing code.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 13:05:27 -08:00
Nguyễn Thái Ngọc Duy
c0e40a2d66 send-pack.c: move async's #ifdef NO_PTHREADS back to run-command.c
On systems that do not support multithread, start_async() is
implemented with fork(). This implementation details unfortunately
leak out at least in send-pack.c [1].

To keep the code base clean of NO_PTHREADS, move the this #ifdef back
to run-command.c. The new wrapper function async_with_fork() at least
helps suggest that this special "close()" is related to async in fork
mode.

[1] 09c9957cf7 (send-pack: avoid deadlock when pack-object dies early
    - 2011-04-25)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05 13:42:11 +09:00
Stefan Beller
c88134870e shallow: add repository argument to is_repository_shallow
Add a repository argument to allow callers of is_repository_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller
cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
brian m. carlson
246d7400fb send-pack: convert remaining functions to struct object_id
Convert the remaining function, feed_object, to use struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:48 -07:00
Jon Simons
bb1356dc64 always check for NULL return from packet_read_line()
The packet_read_line() function will die if it sees any
protocol or socket errors. But it will return NULL for a
flush packet; some callers which are not expecting this may
dereference NULL if they get an unexpected flush. This would
involve the other side breaking protocol, but we should
flag the error rather than segfault.

Signed-off-by: Jon Simons <jon@jonsimons.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-08 12:37:40 -08:00
René Scharfe
a923e05944 send-pack: use internal argv_array of struct child_process
Avoid a magic number of NULL placeholder values and a magic index by
constructing the command line for pack-objects using the embedded
argv_array of the child_process.  The resulting code is shorter and
easier to extend.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-22 13:33:53 -08:00
Jeff King
1cf01a34ea consistently use "fallthrough" comments in switches
Gcc 7 adds -Wimplicit-fallthrough, which can warn when a
switch case falls through to the next case. The general idea
is that the compiler can't tell if this was intentional or
not, so you should annotate any intentional fall-throughs as
such, leaving it to complain about any unannotated ones.

There's a GNU __attribute__ which can be used for
annotation, but of course we'd have to #ifdef it away on
non-gcc compilers. Gcc will also recognize
specially-formatted comments, which matches our current
practice. Let's extend that practice to all of the
unannotated sites (which I did look over and verify that
they were behaving as intended).

Ideally in each case we'd actually give some reasons in the
comment about why we're falling through, or what we're
falling through to. And gcc does support that with
-Wimplicit-fallthrough=2, which relaxes the comment pattern
matching to anything that contains "fallthrough" (or a
variety of spelling variants). However, this isn't the
default for -Wimplicit-fallthrough, nor for -Wextra. In the
name of simplicity, it's probably better for us to support
the default level, which requires "fallthrough" to be the
only thing in the comment (modulo some window dressing like
"else" and some punctuation; see the gcc manual for the
complete set of patterns).

This patch suppresses all warnings due to
-Wimplicit-fallthrough. We might eventually want to add that
to the DEVELOPER Makefile knob, but we should probably wait
until gcc 7 is more widely adopted (since earlier versions
will complain about the unknown warning type).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 12:49:57 +09:00
Rene Scharfe
872d651f52 send-pack: release strbuf on error return in send_pack()
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07 08:49:28 +09:00
Ville Skyttä
6412757514 Spelling fixes
Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-27 10:35:49 -07:00
Brandon Williams
b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Junio C Hamano
b1081e4004 Merge branch 'bc/object-id'
Conversion from unsigned char [40] to struct object_id continues.

* bc/object-id:
  Documentation: update and rename api-sha1-array.txt
  Rename sha1_array to oid_array
  Convert sha1_array_for_each_unique and for_each_abbrev to object_id
  Convert sha1_array_lookup to take struct object_id
  Convert remaining callers of sha1_array_lookup to object_id
  Make sha1_array_append take a struct object_id *
  sha1-array: convert internal storage for struct sha1_array to object_id
  builtin/pull: convert to struct object_id
  submodule: convert check_for_new_submodule_commits to object_id
  sha1_name: convert disambiguate_hint_fn to take object_id
  sha1_name: convert struct disambiguate_state to object_id
  test-sha1-array: convert most code to struct object_id
  parse-options-cb: convert sha1_array_append caller to struct object_id
  fsck: convert init_skiplist to struct object_id
  builtin/receive-pack: convert portions to struct object_id
  builtin/pull: convert portions to struct object_id
  builtin/diff: convert to struct object_id
  Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
  Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
  Define new hash-size constants for allocating memory
2017-04-19 21:37:13 -07:00
brian m. carlson
910650d2f8 Rename sha1_array to oid_array
Since this structure handles an array of object IDs, rename it to struct
oid_array.  Also rename the accessor functions and the initialization
constant.

This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:

    perl -pi -E 's/struct sha1_array/struct oid_array/g'
    perl -pi -E 's/\bsha1_array_/oid_array_/g'
    perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31 08:33:56 -07:00
brian m. carlson
ee3051bd23 sha1-array: convert internal storage for struct sha1_array to object_id
Make the internal storage for struct sha1_array use an array of struct
object_id internally.  Update the users of this struct which inspect its
internals.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-28 09:59:34 -07:00
Junio C Hamano
4e87565d2a Merge branch 'sb/push-options-via-transport'
Recently we started passing the "--push-options" through the
external remote helper interface; now the "smart HTTP" remote
helper understands what to do with the passed information.

* sb/push-options-via-transport:
  remote-curl: allow push options
  send-pack: send push options correctly in stateless-rpc case
2017-03-27 10:59:27 -07:00
Brandon Williams
eb7b9749f2 send-pack: send push options correctly in stateless-rpc case
"git send-pack --stateless-rpc" puts each request in a sequence of pkt-lines
followed by a flush-pkt. The push option code forgot about this and sends push
options and their terminating delimiter as ordinary pkt-lines that get their
length header stripped off by remote-curl before being sent to the server.

The result is multiple malformed requests, which the server rejects.

Fortunately send-pack --stateless-rpc already is aware of this "pkt-line within
pkt-line" framing for the update commands that precede push options. Handle
push options the same way.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-22 15:41:15 -07:00
Jeff King
d1a13d3fcb send-pack: report signal death of pack-objects
If our pack-objects sub-process dies of a signal, then it
likely didn't have a chance to write anything useful to
stderr. The user may be left scratching their head why the
push failed. Let's detect this situation and write something
to stderr.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-07 14:58:36 -08:00
Jeff King
ba69f92db6 send-pack: read "unpack" status even on pack-objects failure
If the local pack-objects of a push fails, we'll tell the
user about it. But one likely cause is that the remote
index-pack stopped reading for some reason (because it
didn't like our input, or encountered another error). In
that case we'd expect the remote to report more details to
us via the "unpack ..." status line. However, the current
code just hangs up completely, and the user never sees it.

Instead, let's call receive_unpack_status(), which will
complain on stderr with whatever reason the remote told us.
Note that if our pack-objects fails because the connection
was severed or the remote just crashed entirely, then our
packet_read_line() call may fail with "the remote end hung
up unexpectedly". That's OK. It's a more accurate
description than what we get now (which is just "some refs
failed to push").

This should be safe from any deadlocks. At the point we make
this call we'll have closed the writing end of the
connection to the server (either by handing it off to
a pack-objects which exited, explicitly in the stateless_rpc
case, or by doing a half-duplex shutdown for a socket). So
there should be no chance that the other side is waiting
for the rest of our pack-objects input.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-07 14:57:39 -08:00
Jeff King
40d05d04dd send-pack: improve unpack-status error messages
When the remote tells us that the "unpack" step failed, we
show an error message. However, unless you are familiar with
the internals of send-pack and receive-pack, it was not
clear that this represented an error on the remote side.
Let's re-word to make that more obvious.

Likewise, when we got an unexpected packet from the other
end, we complained with a vague message but did not actually
show the packet.  Let's fix that.

And finally, neither message was marked for translation. The
message from the remote probably won't be translated, but
there's no reason we can't do better for the local half.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-07 14:54:48 -08:00
Jeff King
f7cd74d19d send-pack: use skip_prefix for parsing unpack status
This avoids repeating ourselves, and the use of magic
numbers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-07 14:54:14 -08:00
Jeff King
7c39df2979 send-pack: extract parsing of "unpack" response
After sending the pack, we call receive_status() which gets
both the "unpack" line and the ref status. Let's break these
into two functions so we can call the first part
independently.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-07 14:51:20 -08:00
Junio C Hamano
13092a916d cocci: refactor common patterns to use xstrdup_or_null()
d64ea0f83b ("git-compat-util: add xstrdup_or_null helper",
2015-01-12) added a handy wrapper that allows us to get a duplicate
of a string or NULL if the original is NULL, but a handful of
codepath predate its introduction or just weren't aware of it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-12 11:22:10 -07:00
Junio C Hamano
940622bc8b Merge branch 'rs/use-strbuf-addstr'
* rs/use-strbuf-addstr:
  use strbuf_addstr() instead of strbuf_addf() with "%s"
  use strbuf_addstr() for adding constant strings to a strbuf
2016-08-08 14:48:41 -07:00
René Scharfe
02962d3684 use strbuf_addstr() for adding constant strings to a strbuf
Replace uses of strbuf_addf() for adding strings with more lightweight
strbuf_addstr() calls.

In http-push.c it becomes easier to see what's going on without having
to verfiy that the definition of PROPFIND_ALL_REQUEST doesn't contain
any format specifiers.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-01 13:42:10 -07:00
Stefan Beller
f6a4e61fbb push: accept push options
This implements everything that is required on the client side to make use
of push options from the porcelain push command.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-14 15:50:41 -07:00
Jeff King
f0bca72dc7 send-pack: use buffered I/O to talk to pack-objects
We start a pack-objects process and then write all of the
positive and negative sha1s to it over a pipe. We do so by
formatting each item into a fixed-size buffer and then
writing each individually. This has two drawbacks:

  1. There's some manual computation of the buffer size,
     which is not immediately obvious is correct (though it
     is).

  2. We write() once per sha1, which means a lot more system
     calls than are necessary.

We can solve both by wrapping the pipe descriptor in a stdio
handle; this is the same technique used by upload-pack when
serving fetches.

Note that we can also simplify and improve the error
handling here. The original detected a single write error
and broke out of the loop (presumably to avoid writing the
error message over and over), but never actually acted on
seeing an error; we just fed truncated input and took
whatever pack-objects returned.

In practice, this probably didn't matter, as the likely
errors would be caused by pack-objects dying (and we'd
probably just die with SIGPIPE anyway). But we can easily
make this simpler and more robust; the stdio handle keeps an
error flag, which we can check at the end.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-08 16:02:40 -07:00
Jeff King
3e8b06d09c send-pack: isolate sigpipe in demuxer thread
If we get an error from pack-objects, we may exit
send_pack() early, before reading the server's status
response. In such a case, we may racily see SIGPIPE from our
async demuxer (which is trying to write that status back to
us), and we'd prefer to continue pushing the error up the
call stack, rather than taking down the whole process with
signal death.

This is safe to do because our demuxer just calls
recv_sideband, whose data writes are all done with
write_or_die(), which will notice SIGPIPE.

We do also write sideband 2 to stderr, and we would no
longer die on SIGPIPE there (if it were piped in the first
place, and if the piped program went away). But that's
probably a good thing, as it likewise should not abort the
push process at all (neither immediately by signal, nor
eventually by reporting failure back to the main thread).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-20 13:33:53 -07:00
Jeff King
739cf49161 send-pack: close demux pipe before finishing async process
This fixes a deadlock on the client side when pushing a
large number of refs from a corrupted repo.  There's a
reproduction script below, but let's start with a
human-readable explanation.

The client side of a push goes something like this:

  1. Start an async process to demux sideband coming from
     the server.

  2. Run pack-objects to send the actual pack, and wait for
     its status via finish_command().

  3. If pack-objects failed, abort immediately.

  4. If pack-objects succeeded, read the per-ref status from
     the server, which is actually coming over a pipe from
     the demux process started in step 1.

We run finish_async() to wait for and clean up the demux
process in two places. In step 3, if we see an error, we
want it to end early. And after step 4, it should be done
writing any data and we are just cleaning it up.

Let's focus on the error case first. We hand the output
descriptor to the server over to pack-objects. So by the
time it has returned an error to us, it has closed the
descriptor and the server has gotten EOF. The server will
mark all refs as failed with "unpacker error" and send us
back the status for each (followed by EOF).

This status goes to the demuxer thread, which relays it over
a pipe to the main thread. But the main thread never even
tries reading the status. It's trying to bail because of the
pack-objects error, and is waiting for the demuxer thread to
finish. If there are a small number of refs, that's OK; the
demuxer thread writes into the pipe buffer, sees EOF from
the server, and quits. But if there are a large number of
refs, it may block on write() back to the main thread,
leading to a deadlock (the main thread is waiting for the
demuxer to finish, the demuxer is waiting for the main
thread to read).

We can break this deadlock by closing the pipe between the
demuxer and the main thread before calling finish_async().
Then the demuxer gets a write() error and exits.

The non-error case usually just works, because we will have
read all of the data from the other side. We do close
demux.out already, but we only do so _after_ calling
finish_async(). This is OK because there shouldn't be any
more data coming from the server. But technically we've only
read to a flush packet, and a broken or malicious server
could be sending more cruft. In such a case, we would hit
the same deadlock. Closing the pipe first doesn't affect the
normal case, and means that for a cruft-sending server,
we'll notice a write() error rather than deadlocking.

Note that when write() sees this error, we'll actually
deliver SIGPIPE to the thread, which will take down the
whole process (unless we're compiled with NO_PTHREADS). This
isn't ideal, but it's an improvement over the status quo,
which is deadlocking. And SIGPIPE handling in async threads
is a bigger problem that we can deal with separately.

A simple reproduction for the error case is below. It's
technically racy (we could exit the main process and take
down the async thread with us before it even reads the
status), though in practice it seems to fail pretty
consistently.

    git init repo &&
    cd repo &&

    # make some commits; we need two so we can simulate corruption
    # in the history later.
    git commit --allow-empty -m one &&
    one=$(git rev-parse HEAD) &&
    git commit --allow-empty -m two &&
    two=$(git rev-parse HEAD) &&

    # now make a ton of refs; our goal here is to overflow the pipe buffer
    # when reporting the ref status, which will cause the demuxer to block
    # on write()
    for i in $(seq 20000); do
    	echo "create refs/heads/this-is-a-really-long-branch-name-$i $two"
    done |
    git update-ref --stdin &&

    # now make a corruption in the history such that pack-objects will fail
    rm -vf .git/objects/$(echo $one | sed 's}..}&/}') &&

    # and then push the result
    git init --bare dst.git &&
    git push --mirror dst.git

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-20 13:33:53 -07:00
brian m. carlson
f4e54d02b8 Convert struct ref to use object_id.
Use struct object_id in three fields in struct ref and convert all the
necessary places that use it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Junio C Hamano
b21089db6a Merge branch 'db/push-sign-if-asked'
The client side codepaths in "git push" have been cleaned up
and the user can request to perform an optional "signed push",
i.e. sign only when the other end accepts signed push.

* db/push-sign-if-asked:
  push: add a config option push.gpgSign for default signed pushes
  push: support signing pushes iff the server supports it
  builtin/send-pack.c: use parse_options API
  config.c: rename git_config_maybe_bool_text and export it as git_parse_maybe_bool
  transport: remove git_transport_options.push_cert
  gitremote-helpers.txt: document pushcert option
  Documentation/git-send-pack.txt: document --signed
  Documentation/git-send-pack.txt: wrap long synopsis line
  Documentation/git-push.txt: document when --signed may fail
2015-08-31 15:39:08 -07:00
Dave Borowitz
30261094b1 push: support signing pushes iff the server supports it
Add a new flag --sign=true (or --sign=false), which means the same
thing as the original --signed (or --no-signed).  Give it a third
value --sign=if-asked to tell push and send-pack to send a push
certificate if and only if the server advertised a push cert nonce.

If not, warn the user that their push may not be as secure as they
thought.

Signed-off-by: Dave Borowitz <dborowitz@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-19 12:58:45 -07:00
Junio C Hamano
a916cb5fb4 Merge branch 'bc/object-id'
Identify parts of the code that knows that we use SHA-1 hash to
name our objects too much, and use (1) symbolic constants instead
of hardcoded 20 as byte count and/or (2) use struct object_id
instead of unsigned char [20] for object names.

* bc/object-id:
  apply: convert threeway_stage to object_id
  patch-id: convert to use struct object_id
  commit: convert parts to struct object_id
  diff: convert struct combine_diff_path to object_id
  bulk-checkin.c: convert to use struct object_id
  zip: use GIT_SHA1_HEXSZ for trailers
  archive.c: convert to use struct object_id
  bisect.c: convert leaf functions to use struct object_id
  define utility functions for object IDs
  define a structure for object IDs
2015-05-05 21:00:23 -07:00
Junio C Hamano
268d5bc2b2 Merge branch 'jc/push-cert'
The "git push --signed" protocol extension did not limit what the
"nonce" that is a server-chosen string can contain or how long it
can be, which was unnecessarily lax.  Limit both the length and the
alphabet to a reasonably small space that can still have enough
entropy.

* jc/push-cert:
  push --signed: tighten what the receiving end can ask to sign
2015-04-20 15:28:31 -07:00
Junio C Hamano
3c6151dad3 Merge branch 'sb/atomic-push'
* sb/atomic-push:
  send-pack: unify error messages for unsupported capabilities
2015-04-02 12:34:43 -07:00
Junio C Hamano
afcb6ee30a push --signed: tighten what the receiving end can ask to sign
Instead of blindly trusting the receiving side to give us a sensible
nonce to sign, limit the length (max 256 bytes) and the alphabet
(alnum and a few selected punctuations, enough to encode in base64)
that can be used in nonce.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-02 11:05:18 -07:00
Ralf Thielow
c8b8f22aa9 send-pack: unify error messages for unsupported capabilities
If --signed is not supported, the error message names the remote
"receiving end". If --atomic is not supported, the error message
names the remote "server". Unify the naming to "receiving end"
as we're in the context of "push".

Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-02 11:02:52 -07:00
brian m. carlson
7683e2e6e3 commit: convert parts to struct object_id
Convert struct commit_graft and necessary local parts of commit.c.
Also, convert several constants based on the hex length of an SHA-1 to
use GIT_SHA1_HEXSZ, and move several magic constants into variables for
readability.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-13 22:43:13 -07:00
Junio C Hamano
39fa6112ec Merge branch 'sb/atomic-push'
"git push" has been taught a "--atomic" option that makes push to
update more than one ref an "all-or-none" affair.

* sb/atomic-push:
  Document receive.advertiseatomic
  t5543-atomic-push.sh: add basic tests for atomic pushes
  push.c: add an --atomic argument
  send-pack.c: add --atomic command line argument
  send-pack: rename ref_update_to_be_sent to check_to_send_update
  receive-pack.c: negotiate atomic push support
  receive-pack.c: add execute_commands_atomic function
  receive-pack.c: move transaction handling in a central place
  receive-pack.c: move iterating over all commands outside execute_commands
  receive-pack.c: die instead of error in case of possible future bug
  receive-pack.c: shorten the execute_commands loop over all commands
2015-02-11 13:43:51 -08:00
Ronnie Sahlberg
4ff17f10c4 send-pack.c: add --atomic command line argument
This adds support to send-pack to negotiate and use atomic pushes
iff the server supports it. Atomic pushes are activated by a new command
line flag --atomic.

In order to do this we also need to change the semantics for send_pack()
slightly. The existing send_pack() function actually doesn't send all the
refs back to the server when multiple refs are involved, for example
when using --all. Several of the failure modes for pushes can already be
detected locally in the send_pack client based on the information from the
initial server side list of all the refs as generated by receive-pack.
Any such refs that we thus know would fail to push are thus pruned from
the list of refs we send to the server to update.

For atomic pushes, we have to deal thus with both failures that are detected
locally as well as failures that are reported back from the server. In order
to do so we treat all local failures as push failures too.

We introduce a new status code REF_STATUS_ATOMIC_PUSH_FAILED so we can
flag all refs that we would normally have tried to push to the server
but we did not due to local failures. This is to improve the error message
back to the end user to flag that "these refs failed to update since the
atomic push operation failed."

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-07 19:56:44 -08:00
Stefan Beller
7582e9397c send-pack: rename ref_update_to_be_sent to check_to_send_update
This renames ref_update_to_be_sent to check_to_send_update and inverts
the meaning of the return value. Having the return value inverted we
can have different values for the error codes. This is useful in a
later patch when we want to know if we hit the CHECK_REF_STATUS_REJECTED
case.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-07 19:56:44 -08:00
brian m. carlson
2dacf26d09 pack-objects: use --objects-edge-aggressive for shallow repos
When fetching into or pushing from a shallow repository, we want to
aggressively mark edges as uninteresting, since this decreases the pack
size.  However, aggressively marking edges can negatively affect
performance on large non-shallow repositories with lots of refs.

Teach pack-objects a --shallow option to indicate that we're pushing
from or fetching into a shallow repository.  Use
--objects-edge-aggressive only for shallow repositories and otherwise
use --objects-edge, which performs better in the general case.  Update
the callers to pass the --shallow option when they are dealing with a
shallow repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-29 09:58:25 -08:00
Junio C Hamano
fb06b5280e Merge branch 'jc/push-cert'
Allow "git push" request to be signed, so that it can be verified and
audited, using the GPG signature of the person who pushed, that the
tips of branches at a public repository really point the commits
the pusher wanted to, without having to "trust" the server.

* jc/push-cert: (24 commits)
  receive-pack::hmac_sha1(): copy the entire SHA-1 hash out
  signed push: allow stale nonce in stateless mode
  signed push: teach smart-HTTP to pass "git push --signed" around
  signed push: fortify against replay attacks
  signed push: add "pushee" header to push certificate
  signed push: remove duplicated protocol info
  send-pack: send feature request on push-cert packet
  receive-pack: GPG-validate push certificates
  push: the beginning of "git push --signed"
  pack-protocol doc: typofix for PKT-LINE
  gpg-interface: move parse_signature() to where it should be
  gpg-interface: move parse_gpg_output() to where it should be
  send-pack: clarify that cmds_sent is a boolean
  send-pack: refactor inspecting and resetting status and sending commands
  send-pack: rename "new_refs" to "need_pack_data"
  receive-pack: factor out capability string generation
  send-pack: factor out capability string generation
  send-pack: always send capabilities
  send-pack: refactor decision to send update per ref
  send-pack: move REF_STATUS_REJECT_NODELETE logic a bit higher
  ...
2014-10-08 13:05:25 -07:00