Commit graph

83 commits

Author SHA1 Message Date
Junio C Hamano
92251b1b5b Merge branch 'nd/shallow-clone'
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).

* nd/shallow-clone: (31 commits)
  t5537: fix incorrect expectation in test case 10
  shallow: remove unused code
  send-pack.c: mark a file-local function static
  git-clone.txt: remove shallow clone limitations
  prune: clean .git/shallow after pruning objects
  clone: use git protocol for cloning shallow repo locally
  send-pack: support pushing from a shallow clone via http
  receive-pack: support pushing to a shallow clone via http
  smart-http: support shallow fetch/clone
  remote-curl: pass ref SHA-1 to fetch-pack as well
  send-pack: support pushing to a shallow clone
  receive-pack: allow pushes that update .git/shallow
  connected.c: add new variant that runs with --shallow-file
  add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
  receive/send-pack: support pushing from a shallow clone
  receive-pack: reorder some code in unpack()
  fetch: add --update-shallow to accept refs that update .git/shallow
  upload-pack: make sure deepening preserves shallow roots
  fetch: support fetching from a shallow repository
  clone: support remote shallow repository
  ...
2014-01-17 12:21:20 -08:00
Junio C Hamano
1945e8ac85 Merge branch 'tb/clone-ssh-with-colon-for-port'
Be more careful when parsing remote repository URL given in the
scp-style host:path notation.

* tb/clone-ssh-with-colon-for-port:
  git_connect(): use common return point
  connect.c: refactor url parsing
  git_connect(): refactor the port handling for ssh
  git fetch: support host:/~repo
  t5500: add test cases for diag-url
  git fetch-pack: add --diag-url
  git_connect: factor out discovery of the protocol and its parts
  git_connect: remove artificial limit of a remote command
  t5601: add tests for ssh
  t5601: remove clear_ssh, refactor setup_ssh_wrapper
2013-12-17 12:03:32 -08:00
Nguyễn Thái Ngọc Duy
16094885ca smart-http: support shallow fetch/clone
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:18 -08:00
Nguyễn Thái Ngọc Duy
58f2ed051f remote-curl: pass ref SHA-1 to fetch-pack as well
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:18 -08:00
Nguyễn Thái Ngọc Duy
beea4152d9 clone: support remote shallow repository
Cloning from a shallow repository does not follow the "8 steps for new
.git/shallow" because if it does we need to get through step 6 for all
refs. That means commit walking down to the bottom.

Instead the rule to create .git/shallow is simpler and, more
importantly, cheap: if a shallow commit is found in the pack, it's
probably used (i.e. reachable from some refs), so we add it. Others
are dropped.

One may notice this method seems flawed by the word "probably". A
shallow commit may not be reachable from any refs at all if it's
attached to an object island (a group of objects that are not
reachable by any refs).

If that object island is not complete, a new fetch request may send
more objects to connect it to some ref. At that time, because we
incorrectly installed the shallow commit in this island, the user will
not see anything after that commit (fsck is still ok). This is not
desired.

Given that object islands are rare (C Git never sends such islands for
security reasons) and do not really harm the repository integrity, a
tradeoff is made to surprise the user occasionally but work faster
everyday.

A new option --strict could be added later that follows exactly the 8
steps. "git prune" can also learn to remove dangling objects _and_ the
shallow commits that are attached to them from .git/shallow.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:17 -08:00
Nguyễn Thái Ngọc Duy
b06dcd7d68 connect.c: teach get_remote_heads to parse "shallow" lines
No callers pass a non-empty pointer as shallow_points at this
stage. As a result, all clients still refuse to talk to shallow
repository on the other end.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:16 -08:00
Torsten Bögershausen
5610b7c0c6 git fetch-pack: add --diag-url
The main purpose is to trace the URL parser called by git_connect() in
connect.c

The main features of the parser can be listed as this:

- parse out host and path for URLs with a scheme (git:// file:// ssh://)
- parse host names embedded by [] correctly
- extract the port number, if present
- separate URLs like "file" (which are local)
  from URLs like "host:repo" which should use ssh

Add the new parameter "--diag-url" to "git fetch-pack", which prints
the value for protocol, host and path to stderr and exits.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-09 14:54:47 -08:00
Christian Couder
5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Junio C Hamano
2233ad4534 Merge branch 'jc/push-cas'
Allow a safer "rewind of the remote tip" push than blind "--force",
by requiring that the overwritten remote ref to be unchanged since
the new history to replace it was prepared.

The machinery is more or less ready.  The "--force" option is again
the big red button to override any safety, thanks to J6t's sanity
(the original round allowed --lockref to defeat --force).

The logic to choose the default implemented here is fragile
(e.g. "git fetch" after seeing a failure will update the
remote-tracking branch and will make the next "push" pass,
defeating the safety pretty easily).  It is suitable only for the
simplest workflows, and it may hurt users more than it helps them.

* jc/push-cas:
  push: teach --force-with-lease to smart-http transport
  send-pack: fix parsing of --force-with-lease option
  t5540/5541: smart-http does not support "--force-with-lease"
  t5533: test "push --force-with-lease"
  push --force-with-lease: tie it all together
  push --force-with-lease: implement logic to populate old_sha1_expect[]
  remote.c: add command line option parser for "--force-with-lease"
  builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN
  cache.h: move remote/connect API out of it
2013-09-09 14:30:29 -07:00
Nguyễn Thái Ngọc Duy
9ba380481c smart http: use the same connectivity check on cloning
This is an extension of c6807a4 (clone: open a shortcut for
connectivity check - 2013-05-26) to reduce the cost of connectivity
check at clone time, this time with smart http protocol.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-23 12:18:18 -07:00
Junio C Hamano
47a5918536 cache.h: move remote/connect API out of it
The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me.  This structure is not
about the local ref used by sha1-name API to name local objects.

It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values.  It belongs to "remote.h" together
with "struct refspec".

While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-08 14:34:24 -07:00
Junio C Hamano
e013bdab0f Merge branch 'jk/pkt-line-cleanup'
Clean up pkt-line API, implementation and its callers to make them
more robust.

* jk/pkt-line-cleanup:
  do not use GIT_TRACE_PACKET=3 in tests
  remote-curl: always parse incoming refs
  remote-curl: move ref-parsing code up in file
  remote-curl: pass buffer straight to get_remote_heads
  teach get_remote_heads to read from a memory buffer
  pkt-line: share buffer/descriptor reading implementation
  pkt-line: provide a LARGE_PACKET_MAX static buffer
  pkt-line: move LARGE_PACKET_MAX definition from sideband
  pkt-line: teach packet_read_line to chomp newlines
  pkt-line: provide a generic reading function with options
  pkt-line: drop safe_write function
  pkt-line: move a misplaced comment
  write_or_die: raise SIGPIPE when we get EPIPE
  upload-archive: use argv_array to store client arguments
  upload-archive: do not copy repo name
  send-pack: prefer prefixcmp over memcmp in receive_status
  fetch-pack: fix out-of-bounds buffer offset in get_ack
  upload-pack: remove packet debugging harness
  upload-pack: do not add duplicate objects to shallow list
  upload-pack: use get_sha1_hex to parse "shallow" lines
2013-04-01 08:59:37 -07:00
Jeff King
85edf4f58b teach get_remote_heads to read from a memory buffer
Now that we can read packet data from memory as easily as a
descriptor, get_remote_heads can take either one as a
source. This will allow further refactoring in remote-curl.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-24 00:17:38 -08:00
Jeff King
74543a0423 pkt-line: provide a LARGE_PACKET_MAX static buffer
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:

  1. The only variable-sized data in these lines is a ref
     name, and refs tend to be a lot shorter than 1000
     characters.

  2. When sending ref lines, git-core always limits itself
     to 1000 byte packets.

However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.

This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:

  1. Other git implementations may have followed
     protocol-common.txt and used a larger maximum size. We
     don't bump into it in practice because it would involve
     very long ref names.

  2. We may want to increase the 1000-byte limit one day.
     Since packets are transferred before any capabilities,
     it's difficult to do this in a backwards-compatible
     way. But if we bump the size of buffer the readers can
     handle, eventually older versions of git will be
     obsolete enough that we can justify bumping the
     writers, as well. We don't have plans to do this
     anytime soon, but there is no reason not to start the
     clock ticking now.

Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage.  That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 13:42:22 -08:00
Jeff King
819b929d33 pkt-line: teach packet_read_line to chomp newlines
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.

This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.

As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.

Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility.  However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.

This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.

By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch.  The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 13:42:21 -08:00
Junio C Hamano
f2db854d24 fetch: use struct ref to represent refs to be fetched
Even though "git fetch" has full infrastructure to parse refspecs to
be fetched and match them against the list of refs to come up with
the final list of refs to be fetched, the list of refs that are
requested to be fetched were internally converted to a plain list of
strings at the transport layer and then passed to the underlying
fetch-pack driver.

Stop this conversion and instead pass around an array of refs.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-07 13:53:59 -08:00
Nguyễn Thái Ngọc Duy
745f7a8cac fetch-pack: move core code to libgit.a
fetch_pack() is used by transport.c, part of libgit.a while it stays
in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get
undefined reference if a program that uses libgit.a happens to pull it
in.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
2012-10-29 03:40:29 -04:00
Nguyễn Thái Ngọc Duy
f8eb3036d0 fetch-pack: remove global (static) configuration variable "args"
This helps removes the hack in fetch_pack() that copies my_args to args.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
2012-10-29 03:08:30 -04:00
Michael Haggerty
5f0fc64513 fetch-pack: eliminate spurious error messages
It used to be that if "--all", "--depth", and also explicit references
were sought, then the explicit references were not handled correctly
in filter_refs() because the "--all --depth" code took precedence over
the explicit reference handling, and the explicit references were
never noted as having been found.  So check for explicitly sought
references before proceeding to the "--all --depth" logic.

This fixes two test cases in t5500.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:32 -07:00
Michael Haggerty
b285668dd2 cmd_fetch_pack(): simplify computation of return value
Set the final value at initialization rather than initializing it then
sometimes changing it.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:32 -07:00
Michael Haggerty
778e7543d2 fetch-pack: report missing refs even if no existing refs were received
This fixes a test in t5500.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:32 -07:00
Michael Haggerty
7418f1a037 cmd_fetch_pack(): return early if finish_connect() fails
This simplifies the logic without changing the behavior.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty
f537cfa750 filter_refs(): simplify logic
Simplify flow within loop: first decide whether to keep the reference,
then keep/free it.  This makes it clearer that each ref has exactly
two possible destinies, and removes duplication of the code for
appending the reference to the linked list.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty
5096e48753 filter_refs(): build refs list as we go
Instead of temporarily storing matched refs to temporary array
"return_refs", simply append them to newlist as we go.  This changes
the order of references in newlist to strictly sorted if "--all" and
"--depth" and named references are all specified, but that usage is
broken anyway (see the last two tests in t5500).

This changes the last test in t5500 from segfaulting into just
emitting a spurious error (this will be fixed in a moment).

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty
4ba159996f filter_refs(): delete matched refs from sought list
Remove any references that are available from the remote from the
sought list (rather than overwriting their names with NUL characters,
as previously).  Mark matching entries by writing a non-NULL pointer
to string_list_item::util during the iteration, then use
filter_string_list() later to filter out the entries that have been
marked.

Document this aspect of fetch_pack() in a comment in the header file.
(More documentation is obviously still needed.)

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty
4c58f13ba6 fetch_pack(): update sought->nr to reflect number of unique entries
fetch_pack() removes duplicates from the "sought" list, thereby
shrinking the list.  But previously, the caller was not informed about
the shrinkage.  This would cause a spurious error message to be
emitted by cmd_fetch_pack() if "git fetch-pack" is called with
duplicate refnames.

Instead, remove duplicates using string_list_remove_duplicates(),
which adjusts sought->nr to reflect the new length of the list.

The last test of t5500 inexplicably *required* "git fetch-pack" to
fail when fetching a list of references that contains duplicates;
i.e., it insisted on the buggy behavior.  So change the test to expect
the correct behavior.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty
382a967114 filter_refs(): do not check the same sought_pos twice
Once a match has been found at sought_pos, the entry is zeroed and no
future attempts will match that entry.  So increment sought_pos to
avoid checking against the zeroed-out entry during the next iteration.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty
8bee93dd24 Change fetch_pack() and friends to take string_list arguments
Instead of juggling <nr_heads,heads> (sometimes called
<nr_match,match>), pass around the list of references to be sought in
a single string_list variable called "sought".  Future commits will
make more use of string_list functionality.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Michael Haggerty
63c694534b fetch_pack(): reindent function decl and defn
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-12 11:46:31 -07:00
Jeff King
36c60f7a08 fetch-pack: mention server version with verbose output
Fetch-pack's verbose mode is more of a debugging mode (and
in fact takes two "-v" arguments to trigger via the
porcelain layer). Let's mention the server version as
another possible item of interest.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-13 21:56:05 -07:00
Junio C Hamano
74991a98df fetch-pack: do not ask for unadvertised capabilities
In the same spirit as the previous fix, stop asking for thin-pack, no-progress
and include-tag capabilities when the other end does not claim to support them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-10 14:27:52 -07:00
Jeff King
d50c387163 do not send client agent unless server does first
Commit ff5effdf taught both clients and servers of the git protocol
to send an "agent" capability that just advertises their version for
statistics and debugging purposes.  The protocol-capabilities.txt
document however indicates that the client's advertisement is
actually a response, and should never include capabilities not
mentioned in the server's advertisement.

Adding the unconditional advertisement in the server programs was
OK, then, but the clients broke the protocol.  The server
implementation of git-core itself does not care, but at least one
does: the Google Code git server (or any server using Dulwich), will
hang up with an internal error upon seeing an unknown capability.

Instead, each client must record whether we saw an agent string from
the server, and respond with its agent only if the server mentioned
it first.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-10 12:35:13 -07:00
Jeff King
ff5effdf45 include agent identifier in capability string
Instead of having the client advertise a particular version
number in the git protocol, we have managed extensions and
backwards compatibility by having clients and servers
advertise capabilities that they support. This is far more
robust than having each side consult a table of
known versions, and provides sufficient information for the
protocol interaction to complete.

However, it does not allow servers to keep statistics on
which client versions are being used. This information is
not necessary to complete the network request (the
capabilities provide enough information for that), but it
may be helpful to conduct a general survey of client
versions in use.

We already send the client version in the user-agent header
for http requests; adding it here allows us to gather
similar statistics for non-http requests.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-03 13:03:34 -07:00
Junio C Hamano
12d7d15074 Merge branch 'jk/fetch-pack-remove-dups-optim'
The way "fetch-pack" that is given multiple references to fetch tried to
remove duplicates was very inefficient.

By Jeff King
* jk/fetch-pack-remove-dups-optim:
  fetch-pack: sort incoming heads list earlier
  fetch-pack: avoid quadratic loop in filter_refs
  fetch-pack: sort the list of incoming refs
  add sorting infrastructure for list refs
  fetch-pack: avoid quadratic behavior in remove_duplicates
  fetch-pack: sort incoming heads
2012-05-29 13:09:08 -07:00
Junio C Hamano
4dbfaee0c7 Merge branch 'mh/fetch-pack-constness'
Tighten constness of some local variables in a callchain.

By Michael Haggerty
* mh/fetch-pack-constness:
  cmd_fetch_pack(): respect constness of argv parameter
  cmd_fetch_pack(): combine the loop termination conditions
  cmd_fetch_pack(): handle non-option arguments outside of the loop
  cmd_fetch_pack(): declare dest to be const
2012-05-29 13:08:53 -07:00
Jeff King
3d2a33e57f fetch-pack: sort incoming heads list earlier
Commit 4435968 started sorting heads fed to fetch-pack so
that later commits could use more optimized algorithms;
commit 7db8d53 switched the remove_duplicates function to
such an algorithm.

Of course, the sorting is more effective if you do it
_before_ the algorithm in question.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-24 10:02:37 -07:00
Jeff King
a0de28805d fetch-pack: avoid quadratic loop in filter_refs
We have a list of refs that we want to compare against the
"match" array. The current code searches the match list
linearly, giving quadratic behavior over the number of refs
when you want to fetch all of them.

Instead, we can compare the lists as we go, giving us linear
behavior.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 13:31:03 -07:00
Jeff King
9e8e704f0b fetch-pack: sort the list of incoming refs
Having the list sorted means we can avoid some quadratic
algorithms when comparing lists.

These should typically be sorted already, but they do come
from the remote, so let's be extra careful. Our ref-sorting
implementation does a mergesort, so we do not have to care
about performance degrading in the common case that the list
is already sorted.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 13:31:03 -07:00
Jeff King
7db8d5370f fetch-pack: avoid quadratic behavior in remove_duplicates
We remove duplicate entries from the list of refs we are
fed in fetch-pack. The original algorithm is quadratic over
the number of refs, but since the list is now guaranteed to
be sorted, we can do it in linear time.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 13:31:03 -07:00
Jeff King
443596850f fetch-pack: sort incoming heads
There's no reason to preserve the incoming order of the
heads we're requested to fetch. By having them sorted, we
can replace some of the quadratic algorithms with linear
ones.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 13:31:03 -07:00
Michael Haggerty
57e6fc6958 cmd_fetch_pack(): respect constness of argv parameter
The old code cast away the constness of the strings passed to the
function in argument argv[], which could result in their being
modified by filter_refs().  Fix by copying reference names from argv
and putting them into our own array (similarly to how refnames passed
to stdin were already handled).

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 12:57:20 -07:00
Michael Haggerty
ff22ff9909 cmd_fetch_pack(): combine the loop termination conditions
If an argument that does not start with '-' is found, the loop is
terminated.  So move that check into the for-loop condition.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 12:57:20 -07:00
Michael Haggerty
4cc00fcf5d cmd_fetch_pack(): handle non-option arguments outside of the loop
This makes it more obvious that the code is always executed unless
there is an error, and that the first initialization of nr_heads is
unnecessary.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 12:57:19 -07:00
Michael Haggerty
9d19c6ea52 cmd_fetch_pack(): declare dest to be const
There is no need for it to be non-const, and this avoids the need
for casting away the constness of an argv element.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 12:57:19 -07:00
Junio C Hamano
77cab8af4a Merge branch 'it/fetch-pack-many-refs'
When "git fetch" encounters repositories with too many references, the
command line of "fetch-pack" that is run by a helper e.g. remote-curl,
may fail to hold all of them. Now such an internal invocation can feed
the references through the standard input of "fetch-pack".

By Ivan Todoroski
* it/fetch-pack-many-refs:
  remote-curl: main test case for the OS command line overflow
  fetch-pack: test cases for the new --stdin option
  remote-curl: send the refs to fetch-pack on stdin
  fetch-pack: new --stdin option to read refs from stdin
2012-04-24 14:40:51 -07:00
Ivan Todoroski
078b895fef fetch-pack: new --stdin option to read refs from stdin
If a remote repo has too many tags (or branches), cloning it over the
smart HTTP transport can fail because remote-curl.c puts all the refs
from the remote repo on the fetch-pack command line. This can make the
command line longer than the global OS command line limit, causing
fetch-pack to fail.

This is especially a problem on Windows where the command line limit is
orders of magnitude shorter than Linux. There are already real repos out
there that msysGit cannot clone over smart HTTP due to this problem.

Here is an easy way to trigger this problem:

	git init too-many-refs
	cd too-many-refs
	echo bla > bla.txt
	git add .
	git commit -m test
	sha=$(git rev-parse HEAD)
	tag=$(perl -e 'print "bla" x 30')
	for i in `seq 50000`; do
		echo $sha refs/tags/$tag-$i >> .git/packed-refs
	done

Then share this repo over the smart HTTP protocol and try cloning it:

	$ git clone http://localhost/.../too-many-refs/.git
	Cloning into 'too-many-refs'...
	fatal: cannot exec 'fetch-pack': Argument list too long

50k tags is obviously an absurd number, but it is required to
demonstrate the problem on Linux because it has a much more generous
command line limit. On Windows the clone fails with as little as 500
tags in the above loop, which is getting uncomfortably close to the
number of tags you might see in real long lived repos.

This is not just theoretical, msysGit is already failing to clone our
company repo due to this. It's a large repo converted from CVS, nearly
10 years of history.

Four possible solutions were discussed on the Git mailing list (in no
particular order):

1) Call fetch-pack multiple times with smaller batches of refs.

This was dismissed as inefficient and inelegant.

2) Add option --refs-fd=$n to pass a an fd from where to read the refs.

This was rejected because inheriting descriptors other than
stdin/stdout/stderr through exec() is apparently problematic on Windows,
plus it would require changes to the run-command API to open extra
pipes.

3) Add option --refs-from=$tmpfile to pass the refs using a temp file.

This was not favored because of the temp file requirement.

4) Add option --stdin to pass the refs on stdin, one per line.

In the end this option was chosen as the most efficient and most
desirable from scripting perspective.

There was however a small complication when using stdin to pass refs to
fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
communication with the remote server.

If we are going to sneak refs on stdin line by line, it would have to be
done very carefully in the presence of --stateless-rpc, because when
reading refs line by line we might read ahead too much data into our
buffer and eat some of the remote protocol data which is also coming on
stdin.

One way to solve this would be to refactor get_remote_heads() in
fetch-pack.c to accept a residual buffer from our stdin line parsing
above, but this function is used in several places so other callers
would be burdened by this residual buffer interface even when most of
them don't need it.

In the end we settled on the following solution:

If --stdin is specified without --stateless-rpc, fetch-pack would read
the refs from stdin one per line, in a script friendly format.

However if --stdin is specified together with --stateless-rpc,
fetch-pack would read the refs from stdin in packetized format
(pkt-line) with a flush packet terminating the list of refs. This way we
can read the exact number of bytes that we need from stdin, and then
get_remote_heads() can continue reading from the same fd without losing
a single byte of remote protocol data.

This way the --stdin option only loses generality and scriptability when
used together with --stateless-rpc, which is not easily scriptable
anyway because it also uses pkt-line when talking to the remote server.

Signed-off-by: Ivan Todoroski <grnch@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-02 13:47:15 -07:00
Junio C Hamano
592d051759 Merge branch 'cb/transfer-no-progress'
* cb/transfer-no-progress:
  push/fetch/clone --no-progress suppresses progress output
2012-02-20 00:14:55 -08:00
Clemens Buchacher
01fdc21f6e push/fetch/clone --no-progress suppresses progress output
By default, progress output is disabled if stderr is not a terminal.
The --progress option can be used to force progress output anyways.
Conversely, --no-progress does not force progress output. In particular,
if stderr is a terminal, progress output is enabled.

This is unintuitive. Change --no-progress to force output off.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-13 13:06:53 -08:00
Michael Haggerty
f257659132 everything_local(): mark alternate refs as complete
Objects in an alternate object database are already available to the
local repository and therefore don't need to be fetched.  So mark them
as complete in everything_local().

This fixes a test in t5700.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-12 19:50:39 -08:00
Michael Haggerty
c41a802fe9 fetch-pack.c: inline insert_alternate_refs()
The logic of the (single) caller is clearer without encapsulating this
one line in a function.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-12 19:50:39 -08:00