Commit graph

69 commits

Author SHA1 Message Date
Jeff King
af22a63c39 sideband: diagnose more sideband anomalies
In demultiplex_sideband(), there are two oddities when we check an
incoming packet:

  - if it has zero length, then we assume it's a flush packet. This
    means we fail to notice the difference between a real flush and a
    true zero-length packet that's missing its sideband designator. It's
    not a huge problem in practice because we'd never send a zero-length
    data packet (even our keepalives are otherwise-empty sideband-1
    packets).

    But it would be nice to detect and report the error, since it's
    likely to cause other confusion (we think the other side flushed,
    but they do not).

  - we try to detect packets missing their designator by checking for
    "if (len < 1)". But this will never trigger for "len == 0"; we've
    already detected that and left the function before then.

    It _could_ detect a negative "len" parameter. But in that case, the
    error message is wrong. The issue is not "no sideband" but rather
    "eof while reading the packet". However, this can't actually be
    triggered in practice, because neither of the two callers uses
    pkt_read's GENTLE_ON_EOF flag. Which means they'd die with "the
    remote end hung up unexpectedly" before we even get here.

    So this truly is dead code.

We can improve these cases by passing in a pkt-line status to the
demultiplexer, and by having recv_sideband() use GENTLE_ON_EOF. This
gives us two improvements:

  - we can now reliably detect flush packets, and will report a normal
    packet missing its sideband designator as an error

  - we'll report an eof with a more detailed "protocol error: eof while
    reading sideband packet", rather than the generic "the remote end
    hung up unexpectedly"

  - when we see an eof, we'll flush the sideband scratch buffer, which
    may provide some hints from the remote about why they hung up
    (though note we already flush on newlines, so it's likely that most
    such messages already made it through)

In some sense this patch goes against fbd76cd450 (sideband: reverse its
dependency on pkt-line, 2019-01-16), which caused the sideband code not
to depend on the pkt-line code. But that commit was really just trying
to deal with the circular header dependency. The two modules are
conceptually interlinked, and it was just trying to keep things
compiling. And indeed, there's a sticking point in this patch: because
pkt-line.h includes sideband.h, we can't add the reverse include we need
for the sideband code to have an "enum packet_read_status" parameter.
Nor can we forward declare it, because you can't forward declare an enum
in C. However, C does guarantee that enums fit in an int, so we can just
use that type.

One alternative would be for the callers to check themselves that they
got something sane from the pkt-line code. But besides duplicating
logic, this gets quite tricky. Any error condition requires flushing the
sideband #2 scratch buffer, which only demultiplex_sideband() knows how
to do.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29 09:23:29 -07:00
Johannes Schindelin
8e86cf6581 sideband: report unhandled incomplete sideband messages as bugs
It was pretty tricky to verify that incomplete sideband messages are
handled correctly by the `recv_sideband()`/`demultiplex_sideband()`
code: they have to be flushed out at the end of the loop in
`recv_sideband()`, but the actual flushing is done by the
`demultiplex_sideband()` function (which therefore has to know somehow
that the loop will be done after it returns).

To catch future bugs where incomplete sideband messages might not be
shown by mistake, let's catch that condition and report a bug.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 13:31:02 -07:00
Junio C Hamano
12210859da Merge branch 'bc/sha-256-part-2'
SHA-256 migration work continues.

* bc/sha-256-part-2: (44 commits)
  remote-testgit: adapt for object-format
  bundle: detect hash algorithm when reading refs
  t5300: pass --object-format to git index-pack
  t5704: send object-format capability with SHA-256
  t5703: use object-format serve option
  t5702: offer an object-format capability in the test
  t/helper: initialize the repository for test-sha1-array
  remote-curl: avoid truncating refs with ls-remote
  t1050: pass algorithm to index-pack when outside repo
  builtin/index-pack: add option to specify hash algorithm
  remote-curl: detect algorithm for dumb HTTP by size
  builtin/ls-remote: initialize repository based on fetch
  t5500: make hash independent
  serve: advertise object-format capability for protocol v2
  connect: parse v2 refs with correct hash algorithm
  connect: pass full packet reader when parsing v2 refs
  Documentation/technical: document object-format for protocol v2
  t1302: expect repo format version 1 for SHA-256
  builtin/show-index: provide options to determine hash algo
  t5302: modernize test formatting
  ...
2020-07-06 22:09:13 -07:00
brian m. carlson
9a9f0d3fc0 pkt-line: add a member for hash algorithm
Add a member for the hash algorithm currently in use to the packet
reader so it can parse references correctly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27 10:07:06 -07:00
Denton Liu
0181b600a6 pkt-line: define PACKET_READ_RESPONSE_END
In a future commit, we will use PACKET_READ_RESPONSE_END to separate
messages proxied by remote-curl. To prepare for this, add the
PACKET_READ_RESPONSE_END enum value.

In switch statements that need a case added, die() or BUG() when a
PACKET_READ_RESPONSE_END is unexpected. Otherwise, mirror how
PACKET_READ_DELIM is implemented (especially in cases where packets are
being forwarded).

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24 16:26:00 -07:00
Denton Liu
101736a14c pkt-line: extern packet_length()
In a future commit, we will be manually processing packets and we will
need to access the length header. In order to simplify this, extern
packet_length() so that the logic can be reused.

Change the function parameter from `const char *linelen` to
`const char lenbuf_hex[4]`. Even though these two types behave
identically as function parameters, use the array notation to
semantically indicate exactly what this function is expecting as an
argument. Also, rename it from linelen to lenbuf_hex as the former
sounds like it should be an integral type which is misleading.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-19 15:40:26 -07:00
Junio C Hamano
56bea28036 pkt-line: drop 'const'-ness of a param to set_packet_header()
The function's definition has a paramter of type "int" qualified as
"const".  The fact that the incoming parameter is used as read-only
in the fuction is an implementation detail that the callers should
not have to be told in the prototype declaring it (and "const" there
has no effect, as C passes parameters by value).

The prototype defined for the function in pkt-line.h lacked the
matching "const" for this reason, but apparently some compilers
(e.g. MS Visual C 2017) complain about the parameter type mismatch.

Let's squelch it by removing the "const" that is pointless in the
definition of a small and trivial function like this, which would
not help optimizing compilers nor reading humans that much.

Noticed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-16 13:29:06 +09:00
Jeff King
533ddba47e pkt-line: prepare buffer before handling ERR packets
Since 2d103c31c2 (pack-protocol.txt: accept error packets in any
context, 2018-12-29), the pktline code will detect an ERR packet and die
automatically, saving the caller from dealing with it. But we do so too
early in the function, before we have terminated the buffer with a NUL.

As a result, passing the ERR message to die() may result in us printing
random cruft from a previous packet. This doesn't trigger memory tools
like ASan because we reuse the same buffer over and over (so the
contents are valid and initialized; they're just stale).

We can see demonstrate this by tightening the regex we use to match the
error message in t5516; without this patch, git-fetch will accidentally
print the capabilities from the (much longer) initial packet we
received.

By moving the ERR code later in the function we get a few other
benefits, too:

  - we'll now chomp any newline sent by the other side (which is what we
    want, since die() will add its own newline)

  - we'll now mention the ERR packet with GIT_TRACE_PACKET

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-15 14:00:51 +09:00
Junio C Hamano
27cdbdd134 Merge branch 'jk/no-sigpipe-during-network-transport'
On platforms where "git fetch" is killed with SIGPIPE (e.g. OSX),
the upload-pack that runs on the other end that hangs up after
detecting an error could cause "git fetch" to die with a signal,
which led to a flakey test.  "git fetch" now ignores SIGPIPE during
the network portion of its operation (this is not a problem as we
check the return status from our write(2)s).

* jk/no-sigpipe-during-network-transport:
  fetch: ignore SIGPIPE during network operation
  fetch: avoid calling write_or_die()
2019-03-20 15:16:06 +09:00
Jeff King
37c80012f1 fetch: avoid calling write_or_die()
The write_or_die() function has one quirk that a caller might not
expect: when it sees EPIPE from the write() call, it translates that
into a death by SIGPIPE. This doesn't change the overall behavior (the
program exits either way), but it does potentially confuse test scripts
looking for a non-signal exit code.

Let's switch away from using write_or_die() in a few code paths, which
will give us more consistent exit codes. It also gives us the
opportunity to write more descriptive error messages, since we have
context that write_or_die() does not.

Note that this won't do much by itself, since we'd typically be killed
by SIGPIPE before write_or_die() even gets a chance to do its thing.
That will be addressed in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 15:02:01 +09:00
Jonathan Tan
a97d00799a remote-curl: use post_rpc() for protocol v2 also
When transmitting and receiving POSTs for protocol v0 and v1,
remote-curl uses post_rpc() (and associated functions), but when doing
the same for protocol v2, it uses a separate set of functions
(proxy_rpc() and others). Besides duplication of code, this has caused
at least one bug: the auth retry mechanism that was implemented in v0/v1
was not implemented in v2.

To fix this issue and avoid it in the future, make remote-curl also use
post_rpc() when handling protocol v2. Because line lengths are written
to the HTTP request in protocol v2 (unlike in protocol v0/v1), this
necessitates changes in post_rpc() and some of the functions it uses;
perform these changes too.

A test has been included to ensure that the code for both the unchunked
and chunked variants of the HTTP request is exercised.

Note: stateless_connect() has been updated to use the lower-level packet
reading functions instead of struct packet_reader. The low-level control
is necessary here because we cannot change the destination buffer of
struct packet_reader while it is being used; struct packet_buffer has a
peeking mechanism which relies on the destination buffer being present
in between a peek and a read.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-03 19:00:42 +09:00
Jonathan Tan
0bbc0bc574 {fetch,upload}-pack: sideband v2 fetch response
Currently, a response to a fetch request has sideband support only while
the packfile is being sent, meaning that the server cannot send notices
until the start of the packfile.

Extend sideband support in protocol v2 fetch responses to the whole
response. upload-pack will advertise it if the
uploadpack.allowsidebandall configuration variable is set, and
fetch-pack will automatically request it if advertised.

If the sideband is to be used throughout the whole response, upload-pack
will use it to send errors instead of prefixing a PKT-LINE payload with
"ERR ".

This will be tested in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 11:25:07 -08:00
Jonathan Tan
fbd76cd450 sideband: reverse its dependency on pkt-line
A subsequent patch will teach struct packet_reader a new field that, if
set, instructs it to interpret read data as multiplexed. This will
create a dependency from pkt-line to sideband.

To avoid a circular dependency, split recv_sideband() into 2 parts: the
reading loop (left in recv_sideband()) and the processing of the
contents (in demultiplex_sideband()), and move the former into pkt-line.
This reverses the direction of dependency: sideband no longer depends on
pkt-line, and pkt-line now depends on sideband.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 11:25:07 -08:00
Jonathan Tan
bc2e795cea pkt-line: introduce struct packet_writer
A future patch will allow the client to request multiplexing of the
entire fetch response (and not only during packfile transmission), which
in turn allows the server to send progress and keepalive messages at any
time during the response.

It will be convenient for a future patch if writing options
(specifically, whether the written data is to be multiplexed) could be
controlled from a single place, so create struct packet_writer to serve
as that place, and modify upload-pack to use it.

Currently, it only stores the output fd, but a subsequent patch will (as
described above) introduce an option to determine if the written data is
to be multiplexed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 11:44:48 -08:00
Masaya Suzuki
2d103c31c2 pack-protocol.txt: accept error packets in any context
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.

Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.

Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 13:05:30 -08:00
Nguyễn Thái Ngọc Duy
c60d7697d1 pkt-line.c: mark more strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 11:19:10 -07:00
Nguyễn Thái Ngọc Duy
1a07e59c3e Update messages in preparation for i18n
Many messages will be marked for translation in the following
commits. This commit updates some of them to be more consistent and
reduce diff noise in those commits. Changes are

- keep the first letter of die(), error() and warning() in lowercase
- no full stop in die(), error() or warning() if it's single sentence
  messages
- indentation
- some messages are turned to BUG(), or prefixed with "BUG:" and will
  not be marked for i18n
- some messages are improved to give more information
- some messages are broken down by sentence to be i18n friendly
  (on the same token, combine multiple warning() into one big string)
- the trailing \n is converted to printf_ln if possible, or deleted
  if not redundant
- errno_errno() is used instead of explicit strerror()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 11:19:09 -07:00
Junio C Hamano
50f08db594 Merge branch 'js/use-bug-macro'
Developer support update, by using BUG() macro instead of die() to
mark codepaths that should not happen more clearly.

* js/use-bug-macro:
  BUG_exit_code: fix sparse "symbol not declared" warning
  Convert remaining die*(BUG) messages
  Replace all die("BUG: ...") calls by BUG() ones
  run-command: use BUG() to report bugs, not die()
  test-tool: help verifying BUG() code paths
2018-05-30 14:04:07 +09:00
Johannes Schindelin
033abf97fc Replace all die("BUG: ...") calls by BUG() ones
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).

The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.

Let's just convert all remaining ones in one fell swoop.

This trick was performed by this invocation:

	sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-06 19:06:13 +09:00
Brandon Williams
f1f4d8acf4 pkt-line: add packet_buf_write_len function
Add the 'packet_buf_write_len()' function which allows for writing an
arbitrary length buffer into a 'struct strbuf' and formatting it in
packet-line format.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:01:09 -07:00
Brandon Williams
a4cfd41c7b pkt-line: add delim packet support
One of the design goals of protocol-v2 is to improve the semantics of
flush packets.  Currently in protocol-v1, flush packets are used both to
indicate a break in a list of packet lines as well as an indication that
one side has finished speaking.  This makes it particularly difficult
to implement proxies as a proxy would need to completely understand git
protocol instead of simply looking for a flush packet.

To do this, introduce the special deliminator packet '0001'.  A delim
packet can then be used as a deliminator between lists of packet lines
while flush packets can be reserved to indicate the end of a response.

Documentation for how this packet will be used in protocol v2 will
included in a future patch.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 14:15:06 -07:00
Brandon Williams
77dabc14c4 pkt-line: allow peeking a packet line without consuming it
Sometimes it is advantageous to be able to peek the next packet line
without consuming it (e.g. to be able to determine the protocol version
a server is speaking).  In order to do that introduce 'struct
packet_reader' which is an abstraction around the normal packet reading
logic.  This enables a caller to be able to peek a single line at a time
using 'packet_reader_peek()' and having a caller consume a line by
calling 'packet_reader_read()'.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 14:15:06 -07:00
Brandon Williams
2153d478b7 pkt-line: introduce packet_read_with_status
The current pkt-line API encodes the status of a pkt-line read in the
length of the read content.  An error is indicated with '-1', a flush
with '0' (which can be confusing since a return value of '0' can also
indicate an empty pkt-line), and a positive integer for the length of
the read content otherwise.  This doesn't leave much room for allowing
the addition of additional special packets in the future.

To solve this introduce 'packet_read_with_status()' which reads a packet
and returns the status of the read encoded as an 'enum packet_status'
type.  This allows for easily identifying between special and normal
packets as well as errors.  It also enables easily adding a new special
packet in the future.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 14:15:05 -07:00
Junio C Hamano
4c6dad0059 Merge branch 'bw/protocol-v1'
A new mechanism to upgrade the wire protocol in place is proposed
and demonstrated that it works with the older versions of Git
without harming them.

* bw/protocol-v1:
  Documentation: document Extra Parameters
  ssh: introduce a 'simple' ssh variant
  i5700: add interop test for protocol transition
  http: tell server that the client understands v1
  connect: tell server that the client understands v1
  connect: teach client to recognize v1 server response
  upload-pack, receive-pack: introduce protocol version 1
  daemon: recognize hidden request arguments
  protocol: introduce protocol extension mechanisms
  pkt-line: add packet_write function
  connect: in ref advertisement, shallows are last
2017-12-06 09:23:44 -08:00
Brandon Williams
5d2124b34a pkt-line: add packet_write function
Add a function which can be used to write the contents of an arbitrary
buffer.  This makes it easy to build up data in a buffer before writing
the packet instead of formatting the entire contents of the packet using
'packet_write_fmt()'.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-17 10:51:29 +09:00
Jeff King
61d36330b4 prefer "!=" when checking read_in_full() result
Comparing the result of read_in_full() using less-than is
potentially dangerous, as a negative return value may be
converted to an unsigned type and be considered a success.
This is discussed further in 561598cfcf (read_pack_header:
handle signed/unsigned comparison in read result,
2017-09-13).

Each of these instances is actually fine in practice:

 - in get-tar-commit-id, the HEADERSIZE macro expands to a
   signed integer. If it were switched to an unsigned type
   (e.g., a size_t), then it would be a bug.

 - the other two callers check for a short read only after
   handling a negative return separately. This is a fine
   practice, but we'd prefer to model "!=" as a general
   rule.

So all of these cases can be considered cleanups and not
actual bugfixes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-27 15:45:24 +09:00
Junio C Hamano
c50424a6f0 Merge branch 'jk/write-in-full-fix'
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.

* jk/write-in-full-fix:
  read_pack_header: handle signed/unsigned comparison in read result
  config: flip return value of store_write_*()
  notes-merge: use ssize_t for write_in_full() return value
  pkt-line: check write_in_full() errors against "< 0"
  convert less-trivial versions of "write_in_full() != len"
  avoid "write_in_full(fd, buf, len) != len" pattern
  get-tar-commit-id: check write_in_full() return against 0
  config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-09-25 15:24:06 +09:00
Junio C Hamano
c78e182d55 Merge branch 'ma/pkt-line-leakfix'
A leakfix.

* ma/pkt-line-leakfix:
  pkt-line: re-'static'-ify buffer in packet_write_fmt_1()
2017-09-19 10:47:52 +09:00
Jeff King
4c95e3dd28 pkt-line: check write_in_full() errors against "< 0"
As with the previous two commits, we prefer to check
write_in_full()'s return value to see if it is negative,
rather than comparing it to the input length.

These cases actually flip the logic to check for success,
making conversion a little different than in other cases. We
could of course write:

  if (write_in_full(...) >= 0)
          return 0;
  return error(...);

But our usual method of spelling write() error checks is
just "< 0". So let's flip the logic for each of these
conditionals to our usual style.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14 15:17:59 +09:00
Martin Ågren
150efef1e7 pkt-line: re-'static'-ify buffer in packet_write_fmt_1()
The static-ness was silently dropped in commit 70428d1a5 ("pkt-line: add
packet_write_fmt_gently()", 2016-10-16). As a result, for each call to
packet_write_fmt_1, we allocate and leak a buffer.

We could keep the strbuf non-static and instead make sure we always
release it before returning (but not before we die, so that we don't
touch errno). That would also prepare us for threaded use. But until
that needs to happen, let's just restore the static-ness so that we get
back to a situation where we (eventually) do not continuosly keep
allocating memory.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 13:11:14 +09:00
Jonathan Tan
fa64a2fdbe sub-process: refactor handshake to common function
Refactor, into a common function, the version and capability negotiation
done when invoking a long-running process as a clean or smudge filter.
This will be useful for other Git code that needs to interact similarly
with a long-running process.

As you can see in the change to t0021, this commit changes the error
message reported when the long-running process does not introduce itself
with the expected "server"-terminated line. Originally, the error
message reports that the filter "does not support filter protocol
version 2", differentiating between the old single-file filter protocol
and the new multi-file filter protocol - I have updated it to something
more generic and useful.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-26 13:00:40 -07:00
Ben Peart
c0c70f7ac0 convert: move packet_write_line() into pkt-line as packet_writel()
Add packet_writel() which writes multiple lines in a single call and
then calls packet_flush_gently(). Update convert.c to use the new
packet_writel() function from pkt-line.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 10:57:26 +09:00
Ben Peart
825b9226bf pkt-line: add packet_read_line_gently()
Add packet_read_line_gently() to enable reading a line without dying on
EOF.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 10:57:26 +09:00
Ben Peart
974b50c556 pkt-line: fix packet_read_line() to handle len < 0 errors
Update packet_read_line() to test for len > 0 to avoid potential bug
if read functions return lengths less than zero to indicate errors.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Found/Fixed-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 10:57:26 +09:00
Lars Schneider
bb643d8bf8 pkt-line: add functions to read/write flush terminated packet streams
write_packetized_from_fd() and write_packetized_from_buf() write a
stream of packets. All content packets use the maximal packet size
except for the last one. After the last content packet a `flush` control
packet is written.

read_packetized_to_strbuf() reads arbitrary sized packets until it
detects a `flush` packet.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:36:50 -07:00
Lars Schneider
edfb780cd4 pkt-line: add packet_write_gently()
packet_write_fmt_gently() uses format_packet() which lets the caller
only send string data via "%s". That means it cannot be used for
arbitrary data that may contain NULs.

Add packet_write_gently() which writes arbitrary data and does not die
in case of an error. The function is used by other pkt-line functions in
a subsequent patch.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:36:50 -07:00
Lars Schneider
038ce90f2f pkt-line: add packet_flush_gently()
packet_flush() would die in case of a write error even though for some
callers an error would be acceptable. Add packet_flush_gently() which
writes a pkt-line flush packet like packet_flush() but does not die in
case of an error. The function is used in a subsequent patch.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:36:50 -07:00
Lars Schneider
70428d1a52 pkt-line: add packet_write_fmt_gently()
packet_write_fmt() would die in case of a write error even though for
some callers an error would be acceptable. Add packet_write_fmt_gently()
which writes a formatted pkt-line like packet_write_fmt() but does not
die in case of an error. The function is used in a subsequent patch.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:36:50 -07:00
Lars Schneider
2f60bdd1a8 pkt-line: extract set_packet_header()
Extracted set_packet_header() function converts an integer to a 4 byte
hex string. Make this function locally available so that other pkt-line
functions could use it.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:36:50 -07:00
Lars Schneider
81c634e94f pkt-line: rename packet_write() to packet_write_fmt()
packet_write() should be called packet_write_fmt() because it is a
printf-like function that takes a format string as first parameter.

packet_write_fmt() should be used for text strings only. Arbitrary
binary data should use a new packet_write() function that is introduced
in a subsequent patch.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-17 11:36:50 -07:00
René Scharfe
d23309733a introduce hex2chr() for converting two hexadecimal digits to a character
Add and use a helper function that decodes the char value of two
hexadecimal digits.  It returns a negative number on error, avoids
running over the end of the given string and doesn't shift negative
values.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 10:42:46 -07:00
Jeff King
fd89433dd0 pkt-line: show packets in async processes as "sideband"
If you run "GIT_TRACE_PACKET=1 git push", you may get
confusing output like (line prefixes omitted for clarity):

   packet:      push< \1000eunpack ok0019ok refs/heads/master0000
   packet:      push< unpack ok
   packet:      push< ok refs/heads/master
   packet:      push< 0000
   packet:      push< 0000

Why do we see the data twice, once apparently wrapped inside
another pkt-line, and once unwrapped? Why do we get two
flush packets?

The answer is that we start an async process to demux the
sideband data. The first entry comes from the sideband
process reading the data, and the second from push itself.
Likewise, the first flush is inside the demuxed packet, and
the second is an actual sideband flush.

We can make this a bit more clear by marking the sideband
demuxer explicitly as "sideband" rather than "push". The
most elegant way to do this would be to simply call
packet_trace_identity() inside the sideband demuxer. But we
can't do that reliably, because it relies on a global
variable, which might be shared if pthreads are in use.

What we really need is thread-local storage for
packet_trace_identity. But the async code does not provide
an interface for that, and it would be messy to add it here
(we'd have to care about pthreads, initializing our
pthread_key_t ahead of time, etc).

So instead, let us just assume that any async process is
handling sideband data. That's always true now, and is
likely to remain so in the future.

The output looks like:

   packet:  sideband< \1000eunpack ok0019ok refs/heads/master0000
   packet:      push< unpack ok
   packet:      push< ok refs/heads/master
   packet:      push< 0000
   packet:  sideband< 0000

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-01 15:11:57 -07:00
Jeff King
323598387d pkt-line: support tracing verbatim pack contents
When debugging the pack protocol, it is sometimes useful to
store the verbatim pack that we sent or received on the
wire. Looking at the on-disk result is often not helpful for
a few reasons:

  1. If the operation is a clone, we destroy the repo on
     failure, leaving nothing on disk.

  2. If the pack is small, we unpack it immediately, and the
     full pack never hits the disk.

  3. If we feed the pack to "index-pack --fix-thin", the
     resulting pack has the extra delta bases added to it.

We already have a GIT_TRACE_PACKET mechanism for tracing
packets. Let's extend it with GIT_TRACE_PACKFILE to dump the
verbatim packfile.

There are a few other positive fallouts that come from
rearranging this code:

 - We currently disable the packet trace after seeing the
   PACK header, even though we may get human-readable lines
   on other sidebands; now we include them in the trace.

 - We currently try to print "PACK ..." in the trace to
   indicate that the packfile has started. But because we
   disable packet tracing, we never printed this line. We
   will now do so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-16 13:24:22 -07:00
Jeff King
d6d1a75e51 pkt-line: tighten sideband PACK check when tracing
To find the start of the pack data, we accept the word PACK
at the beginning of any sideband channel, even though what
we really want is to find the pack data on channel 1. In
practice this doesn't matter, as sideband-2 messages tend to
start with "error:" or similar, but it is a good idea to be
explicit (especially as we add more code in this area, we
will rely on this assumption).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-15 13:25:52 -07:00
Jeff King
f3612acb93 pkt-line: simplify starts_with checks in packet tracing
We carefully check that our pkt buffer has enough characters
before seeing if it starts with "PACK". The intent is to
avoid reading random memory if we get a short buffer like
"PAC".

However, we know that the traced packets are always
NUL-terminated. They come from one of these sources:

  1. A string literal.

  2. `format_packet`, which uses a strbuf.

  3. `packet_read`, which defensively NUL-terminates what we
     read.

We can therefore drop the length checks, as we know we will
hit the trailing NUL if we have a short input.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-15 13:25:50 -07:00
Jeff King
8e9faf27c1 pkt-line: allow writing of LARGE_PACKET_MAX buffers
When we send out pkt-lines with refnames, we use a static
1000-byte buffer. This means that the maximum size of a ref
over the git protocol is around 950 bytes (the exact size
depends on the protocol line being written, but figure on a sha1
plus some boilerplate).

This is enough for any sane workflow, but occasionally odd
things happen (e.g., a bug may create a ref "foo/foo/foo/..."
accidentally).  With the current code, you cannot even use
"push" to delete such a ref from a remote.

Let's switch to using a strbuf, with a hard-limit of
LARGE_PACKET_MAX (which is specified by the protocol).  This
matches the size of the readers, as of 74543a0 (pkt-line:
provide a LARGE_PACKET_MAX static buffer, 2013-02-20).
Versions of git older than that will complain about our
large packets, but it's really no worse than the current
behavior. Right now the sender barfs with "impossibly long
line" trying to send the packet, and afterwards the reader
will barf with "protocol error: bad line length %d", which
is arguably better anyway.

Note that we're not really _solving_ the problem here, but
just bumping the limits. In theory, the length of a ref is
unbounded, and pkt-line can only represent sizes up to
65531 bytes. So we are just bumping the limit, not removing
it.  But hopefully 64K should be enough for anyone.

As a bonus, by using a strbuf for the formatting we can
eliminate an unnecessary copy in format_buf_write.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-10 13:09:21 -08:00
Karsten Blees
6aa3085702 trace: improve trace performance
The trace API currently rechecks the environment variable and reopens the
trace file on every API call. This has the ugly side effect that errors
(e.g. file cannot be opened, or the user specified a relative path) are
also reported on every call. Performance can be improved by about factor
three by remembering the environment state and keeping the file open.

Replace the 'const char *key' parameter in the API with a pointer to a
'struct trace_key' that bundles the environment variable name with
additional, trace-internal state. Change the call sites of these APIs to
use a static 'struct trace_key' instead of a string constant.

In trace.c::get_trace_fd(), save and reuse the file descriptor in 'struct
trace_key'.

Add a 'trace_disable()' API, so that packet_trace() can cleanly disable
tracing when it encounters packed data (instead of using unsetenv()).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 21:24:23 -07:00
Christian Couder
5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Jeff King
4981fe750b pkt-line: share buffer/descriptor reading implementation
The packet_read function reads from a descriptor. The
packet_get_line function is similar, but reads from an
in-memory buffer, and uses a completely separate
implementation. This patch teaches the generic packet_read
function to accept either source, and we can do away with
packet_get_line's implementation.

There are two other differences to account for between the
old and new functions. The first is that we used to read
into a strbuf, but now read into a fixed size buffer. The
only two callers are fine with that, and in fact it
simplifies their code, since they can use the same
static-buffer interface as the rest of the packet_read_line
callers (and we provide a similar convenience wrapper for
reading from a buffer rather than a descriptor).

This is technically an externally-visible behavior change in
that we used to accept arbitrary sized packets up to 65532
bytes, and now cap out at LARGE_PACKET_MAX, 65520. In
practice this doesn't matter, as we use it only for parsing
smart-http headers (of which there is exactly one defined,
and it is small and fixed-size). And any extension headers
would be breaking the protocol to go over LARGE_PACKET_MAX
anyway.

The other difference is that packet_get_line would return
on error rather than dying. However, both callers of
packet_get_line are actually improved by dying.

The first caller does its own error checking, but we can
drop that; as a result, we'll actually get more specific
reporting about protocol breakage when packet_read dies
internally. The only downside is that packet_read will not
print the smart-http URL that failed, but that's not a big
deal; anybody not debugging can already see the remote's URL
already, and anybody debugging would want to run with
GIT_CURL_VERBOSE anyway to see way more information.

The second caller, which is just trying to skip past any
extra smart-http headers (of which there are none defined,
but which we allow to keep room for future expansion), did
not error check at all. As a result, it would treat an error
just like a flush packet. The resulting mess would generally
cause an error later in get_remote_heads, but now we get
error reporting much closer to the source of the problem.

Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 00:14:15 -08:00
Jeff King
74543a0423 pkt-line: provide a LARGE_PACKET_MAX static buffer
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:

  1. The only variable-sized data in these lines is a ref
     name, and refs tend to be a lot shorter than 1000
     characters.

  2. When sending ref lines, git-core always limits itself
     to 1000 byte packets.

However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.

This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:

  1. Other git implementations may have followed
     protocol-common.txt and used a larger maximum size. We
     don't bump into it in practice because it would involve
     very long ref names.

  2. We may want to increase the 1000-byte limit one day.
     Since packets are transferred before any capabilities,
     it's difficult to do this in a backwards-compatible
     way. But if we bump the size of buffer the readers can
     handle, eventually older versions of git will be
     obsolete enough that we can justify bumping the
     writers, as well. We don't have plans to do this
     anytime soon, but there is no reason not to start the
     clock ticking now.

Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage.  That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 13:42:22 -08:00