Commit graph

488 commits

Author SHA1 Message Date
Ævar Arnfjörð Bjarmason 28a592e4f4 serve.[ch]: don't pass "struct strvec *keys" to commands
The serve.c API added in ed10cb952d (serve: introduce git-serve,
2018-03-15) was passing in the raw capabilities "keys", but nothing
downstream of it ever used them.

Let's remove that code because it's not needed. If we do end up
needing to pass information about the advertisement in the future
it'll make more sense to have serve.c parse the capabilities keys and
pass the result of its parsing, rather than expecting expecting its
API users to parse the same keys again.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 08:59:37 -07:00
Junio C Hamano 644f4a2046 Merge branch 'jt/push-negotiation'
"git push" learns to discover common ancestor with the receiving
end over protocol v2.

* jt/push-negotiation:
  send-pack: support push negotiation
  fetch: teach independent negotiation (no packfile)
  fetch-pack: refactor command and capability write
  fetch-pack: refactor add_haves()
  fetch-pack: refactor process_acks()
2021-05-16 21:05:22 +09:00
Jonathan Tan 9c1e657a8f fetch: teach independent negotiation (no packfile)
Currently, the packfile negotiation step within a Git fetch cannot be
done independent of sending the packfile, even though there is at least
one application wherein this is useful. Therefore, make it possible for
this negotiation step to be done independently. A subsequent commit will
use this for one such application - push negotiation.

This feature is for protocol v2 only. (An implementation for protocol v0
would require a separate implementation in the fetch, transport, and
transport helper code.)

In the protocol, the main hindrance towards independent negotiation is
that the server can unilaterally decide to send the packfile. This is
solved by a "wait-for-done" argument: the server will then wait for the
client to say "done". In practice, the client will never say it; instead
it will cease requests once it is satisfied.

In the client, the main change lies in the transport and transport
helper code. fetch_refs_via_pack() performs everything needed - protocol
version and capability checks, and the negotiation itself.

There are 2 code paths that do not go through fetch_refs_via_pack() that
needed to be individually excluded: the bundle transport (excluded
through requiring smart_options, which the bundle transport doesn't
support) and transport helpers that do not support takeover. If or when
we support independent negotiation for protocol v0, we will need to
modify these 2 code paths to support it. But for now, report failure if
independent negotiation is requested in these cases.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 10:41:29 +09:00
Jeff King 45a187cc34 lookup_unknown_object(): take a repository argument
All of the other lookup_foo() functions take a repository argument, but
lookup_unknown_object() was never converted, and it uses the_repository
internally. Let's fix that.

We could leave a wrapper that uses the_repository, but there aren't that
many calls, so we'll just convert them all. I looked briefly at each
site to see if we had a repository struct (besides the_repository) we
could pass, but none of them do (so this conversion to pass
the_repository is a pure noop in each case, though it does take us one
step closer to eventually getting rid of the_repository).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-13 13:18:46 -07:00
Junio C Hamano 8b4701ae4f Merge branch 'ak/corrected-commit-date'
The commit-graph learned to use corrected commit dates instead of
the generation number to help topological revision traversal.

* ak/corrected-commit-date:
  doc: add corrected commit date info
  commit-reach: use corrected commit dates in paint_down_to_common()
  commit-graph: use generation v2 only if entire chain does
  commit-graph: implement generation data chunk
  commit-graph: implement corrected commit date
  commit-graph: return 64-bit generation number
  commit-graph: add a slab to store topological levels
  t6600-test-reach: generalize *_three_modes
  commit-graph: consolidate fill_commit_graph_info
  revision: parse parent in indegree_walk_step()
  commit-graph: fix regression when computing Bloom filters
2021-02-17 17:21:40 -08:00
Junio C Hamano 60f8121940 Merge branch 'jv/upload-pack-filter-spec-quotefix'
Fix in passing custom args from "git clone" to "upload-pack" on the
other side.

* jv/upload-pack-filter-spec-quotefix:
  t5544: clarify 'hook works with partial clone' test
  upload-pack.c: fix filter spec quoting bug
2021-02-12 14:21:04 -08:00
Jacob Vosmaer ad5df6b782 upload-pack.c: fix filter spec quoting bug
Fix a bug in upload-pack.c that occurs when you combine partial
clone and uploadpack.packObjectsHook. You can reproduce it as
follows:

    git clone -u 'git -c uploadpack.allowfilter '\
	'-c uploadpack.packobjectshook=env '\
	'upload-pack' --filter=blob:none --no-local \
	src.git dst.git

Be careful with the line endings because this has a long quoted
string as the -u argument.

The error I get when I run this is:

	Cloning into '/tmp/broken'...
	remote: fatal: invalid filter-spec ''blob:none''
	error: git upload-pack: git-pack-objects died with error.
	fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
	remote: aborting due to possible repository corruption on the remote side.
	fatal: early EOF
	fatal: index-pack failed

The problem is caused by unneeded quoting.

This bug was already present in 10ac85c785 (upload-pack: add object
filtering for partial clone, 2017-12-08) when the server side filter
support was introduced.  In fact, in 10ac85c785 this was broken
regardless of uploadpack.packObjectsHook. Then in 0b6069fe0a
(fetch-pack: test support excluding large blobs, 2017-12-08) the
quoting was removed but only behind a conditional that depends on
whether uploadpack.packObjectsHook is set.

Because uploadpack.packObjectsHook is apparently rarely used, nobody
noticed the problematic quoting could still happen.

Remove the conditional quoting and add a test for partial clone in
t5544-pack-objects-hook.

Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-28 09:40:24 -08:00
Jeff King 36a317929b refs: switch peel_ref() to peel_iterated_oid()
The peel_ref() interface is confusing and error-prone:

  - it's typically used by ref iteration callbacks that have both a
    refname and oid. But since they pass only the refname, we may load
    the ref value from the filesystem again. This is inefficient, but
    also means we are open to a race if somebody simultaneously updates
    the ref. E.g., this:

      int some_ref_cb(const char *refname, const struct object_id *oid, ...)
      {
              if (!peel_ref(refname, &peeled))
                      printf("%s peels to %s",
                             oid_to_hex(oid), oid_to_hex(&peeled);
      }

    could print nonsense. It is correct to say "refname peels to..."
    (you may see the "before" value or the "after" value, either of
    which is consistent), but mentioning both oids may be mixing
    before/after values.

    Worse, whether this is possible depends on whether the optimization
    to read from the current iterator value kicks in. So it is actually
    not possible with:

      for_each_ref(some_ref_cb);

    but it _is_ possible with:

      head_ref(some_ref_cb);

    which does not use the iterator mechanism (though in practice, HEAD
    should never peel to anything, so this may not be triggerable).

  - it must take a fully-qualified refname for the read_ref_full() code
    path to work. Yet we routinely pass it partial refnames from
    callbacks to for_each_tag_ref(), etc. This happens to work when
    iterating because there we do not call read_ref_full() at all, and
    only use the passed refname to check if it is the same as the
    iterator. But the requirements for the function parameters are quite
    unclear.

Instead of taking a refname, let's instead take an oid. That fixes both
problems. It's a little funny for a "ref" function not to involve refs
at all. The key thing is that it's optimizing under the hood based on
having access to the ref iterator. So let's change the name to make it
clear why you'd want this function versus just peel_object().

There are two other directions I considered but rejected:

  - we could pass the peel information into the each_ref_fn callback.
    However, we don't know if the caller actually wants it or not. For
    packed-refs, providing it is essentially free. But for loose refs,
    we actually have to peel the object, which would be wasteful in most
    cases. We could likewise pass in a flag to the callback indicating
    whether the peeled information is known, but that complicates those
    callbacks, as they then have to decide whether to manually peel
    themselves. Plus it requires changing the interface of every
    callback, whether they care about peeling or not, and there are many
    of them.

  - we could make a function to return the peeled value of the current
    iterated ref (computing it if necessary), and BUG() otherwise. I.e.:

      int peel_current_iterated_ref(struct object_id *out);

    Each of the current callers is an each_ref_fn callback, so they'd
    mostly be happy. But:

      - we use those callbacks with functions like head_ref(), which do
        not use the iteration code. So we'd need to handle the fallback
        case there, anyway.

      - it's possible that a caller would want to call into generic code
        that sometimes is used during iteration and sometimes not. This
        encapsulates the logic to do the fast thing when possible, and
        fallback when necessary.

The implementation is mostly obvious, but I want to call out a few
things in the patch:

  - the test-tool coverage for peel_ref() is now meaningless, as it all
    collapses to a single peel_object() call (arguably they were pretty
    uninteresting before; the tricky part of that function is the
    fast-path we see during iteration, but these calls didn't trigger
    that). I've just dropped it entirely, though note that some other
    tests relied on the tags we created; I've moved that creation to the
    tests where it matters.

  - we no longer need to take a ref_store parameter, since we'd never
    look up a ref now. We do still rely on a global "current iterator"
    variable which _could_ be kept per-ref-store. But in practice this
    is only useful if there are multiple recursive iterations, at which
    point the more appropriate solution is probably a stack of
    iterators. No caller used the actual ref-store parameter anyway
    (they all call the wrapper that passes the_repository).

  - the original only kicked in the optimization when the "refname"
    pointer matched (i.e., not string comparison). We do likewise with
    the "oid" parameter here, but fall back to doing an actual oideq()
    call. This in theory lets us kick in the optimization more often,
    though in practice no current caller cares. It should never be
    wrong, though (peeling is a property of an object, so two refs
    pointing to the same object would peel identically).

  - the original took care not to touch the peeled out-parameter unless
    we found something to put in it. But no caller cares about this, and
    anyway, it is enforced by peel_object() itself (and even in the
    optimized iterator case, that's where we eventually end up). We can
    shorten the code and avoid an extra copy by just passing the
    out-parameter through the stack.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21 15:51:31 -08:00
Abhishek Kumar d7f92784c6 commit-graph: return 64-bit generation number
In a preparatory step for introducing corrected commit dates, let's
return timestamp_t values from commit_graph_generation(), use
timestamp_t for local variables and define GENERATION_NUMBER_INFINITY
as (2 ^ 63 - 1) instead.

We rename GENERATION_NUMBER_MAX to GENERATION_NUMBER_V1_MAX to
represent the largest topological level we can store in the commit data
chunk.

With corrected commit dates implemented, we will have two such *_MAX
variables to denote the largest offset and largest topological level
that can be stored.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-18 16:21:18 -08:00
Junio C Hamano 21127fa982 Merge branch 'tb/partial-clone-filters-fix'
Fix potential server side resource deallocation issues when
responding to a partial clone request.

* tb/partial-clone-filters-fix:
  upload-pack.c: don't free allowed_filters util pointers
  builtin/clone.c: don't ignore transport_fetch_refs() errors
2020-12-17 15:06:40 -08:00
Junio C Hamano 3c9f0df16a Merge branch 'jk/multi-line-indent-style-fix'
Style fix.

* jk/multi-line-indent-style-fix:
  style: indent multiline "if" conditions to align
2020-12-14 10:21:38 -08:00
Junio C Hamano a5e74b4baa Merge branch 'jk/check-config-parsing-error-in-upload-pack'
Tighten error checking in the codepath that responds to "git fetch".

* jk/check-config-parsing-error-in-upload-pack:
  upload-pack: propagate return value from object filter config callback
2020-12-14 10:21:37 -08:00
Junio C Hamano 01b8886a62 Merge branch 'js/trace2-session-id'
The transport layer was taught to optionally exchange the session
ID assigned by the trace2 subsystem during fetch/push transactions.

* js/trace2-session-id:
  receive-pack: log received client session ID
  send-pack: advertise session ID in capabilities
  upload-pack, serve: log received client session ID
  fetch-pack: advertise session ID in capabilities
  transport: log received server session ID
  serve: advertise session ID in v2 capabilities
  receive-pack: advertise session ID in v0 capabilities
  upload-pack: advertise session ID in v0 capabilities
  trace2: add a public function for getting the SID
  docs: new transfer.advertiseSID option
  docs: new capability to advertise session IDs
2020-12-08 15:11:20 -08:00
Taylor Blau 8d133f500a upload-pack.c: don't free allowed_filters util pointers
To keep track of which object filters are allowed or not, 'git
upload-pack' stores the name of each filter in a string_list, and sets
it ->util pointer to be either 0 or 1, indicating whether it is banned
or allowed.

Later on, we attempt to clear that list, but we incorrectly ask for the
util pointers to be free()'d, too. This behavior (introduced back in
6dd3456a8c (upload-pack.c: allow banning certain object filter(s),
2020-08-03)) leads to an invalid free, and causes us to crash.

In order to trigger this, one needs to fetch from a server that (a) has
at least one object filter allowed, and (b) issue a fetch that contains
a subset of the allowed filters (i.e., we cannot ask for a banned
filter, since this causes us to die() before we hit the bogus
string_list_clear()).

In that case, whatever banned filters exist will cause a noop free()
(since those ->util pointers are set to 0), but the first allowed filter
we try to free will crash us.

We never noticed this in the tests because we didn't have an example of
setting 'uploadPackFilter' configuration variables and then following up
with a valid fetch. The first new 'git clone' prevents further
regression here. For good measure on top, add a test which checks the
same behavior at a tree depth greater than 0.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-03 12:42:33 -08:00
Jeff King 08e9df2395 style: indent multiline "if" conditions to align
Commit 6dc905d974 (config: split repo scope to local and worktree,
2020-02-10) made some "if" statements multiline, but didn't indent the
second lines in our usual way.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-03 10:32:32 -08:00
Jeff King d43a21bdbb upload-pack: propagate return value from object filter config callback
If we encounter an error in parse_filter_object_config(), we'll complain
to stderr but won't actually propagate the return value up the stack.
This is unlike most of our config callbacks, which return the error to
git_config() so it can die (this includes the call just below us to
parse_hide_refs_config(), which can also produce errors).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-03 10:25:13 -08:00
Junio C Hamano f3a112a75e Merge branch 'jk/stop-pack-objects-when-fetch-is-killed'
"git fetch" that is killed may leave a pack-objects process behind,
still computing to find a good compression, wasting cycles.  This
has been corrected.

* jk/stop-pack-objects-when-fetch-is-killed:
  upload-pack: kill pack-objects helper on signal or exit
2020-12-03 00:18:07 -08:00
Jeff King 309a4028e7 upload-pack: kill pack-objects helper on signal or exit
We spawn an external pack-objects process to actually send objects to
the remote side. If we are killed by a signal during this process, then
pack-objects may continue to run. As soon as it starts producing output
for the pack, it will see a failure writing to upload-pack and exit
itself. But before then, it may do significant work traversing the
object graph, compressing deltas, etc, which will all be pointless. So
let's make sure to kill as soon as we know that the caller will not read
the result.

There's no test here, since it's inherently racy, but here's an easy
reproduction is on a large-ish repo like linux.git:

  - make sure you don't have pack bitmaps (since they make the enumerating
    phase go quickly). For linux.git it takes ~30s or so to walk the
    whole graph on my machine.

  - run "git clone --no-local -q . dst"; the "-q" is important because
    if pack-objects is writing progress to upload-pack (to get
    multiplexed over the sideband to the client), then it will notice
    pretty quickly the failure to write to stderr

  - kill the client-side clone process in another terminal (don't use
    ^C, as that will send SIGINT to all of the processes)

  - run "ps au | grep git" or similar to observe upload-pack dying
    within 5 seconds (it will send a keepalive that will notice the
    client has gone away)

  - but you'll still see pack-objects consuming 100% CPU (and 1GB+ of
    RAM) during the traversal and delta compression phases. It will exit
    as soon as it starts to write the pack (when it will notice that
    upload-pack went away).

With this patch, pack-objects exits as soon as upload-pack does.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-01 12:05:58 -08:00
Josh Steadmon 829594677c upload-pack, serve: log received client session ID
When upload-pack (protocol v0/v1) or a protocol v2 server receives a
session-id capability from a client, log the received session ID via a
trace2 data event.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 18:26:53 -08:00
Josh Steadmon 791e1adf22 upload-pack: advertise session ID in v0 capabilities
When transfer.advertiseSID is true, advertise upload-pack's session ID
via the new session-id capability.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 18:26:52 -08:00
Daniel Duvall fb3d1a083f upload-pack: allow stateless client EOF just prior to haves
During stateless packfile negotiation where a depth is given, stateless
RPC clients (e.g. git-remote-curl) will send multiple upload-pack
requests with the first containing only the
wants/shallows/deepens/filters and the subsequent containing haves/done.

When upload-pack handles such requests, entering get_common_commits
without checking whether the client has hung up can result in unexpected
EOF during the negotiation loop and a die() with message "fatal: the
remote end hung up unexpectedly".

Real world effects include:

 - A client speaking to git-http-backend via a server that doesn't check
   the exit codes of CGIs (e.g. mod_cgi) doesn't know and doesn't care
   about the fatal. It continues to process the response body as normal.

 - A client speaking to a server that does check the exit code and
   returns an errant HTTP status as a result will fail with the message
   "error: RPC failed; HTTP 500 curl 22 The requested URL returned error:
   500."

 - Admins running servers that surface the failure must workaround it by
   patching code that handles execution of git-http-backend to ignore exit
   codes or take other heuristic approaches.

 - Admins may have to deal with "hung up unexpectedly" log spam related
   to the failures even in cases where the exit code isn't surfaced as an
   HTTP server-side error status.

To avoid these EOF related fatals, have upload-pack gently peek for an
EOF between the sending of shallow/unshallow lines (followed by flush)
and the reading of client haves. If the client has hung up at this
point, exit normally.

Signed-off-by: Daniel Duvall <dan@mutual.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-30 21:18:10 -07:00
Junio C Hamano d8488b9e86 Merge branch 'rs/more-buffered-io'
Use more buffered I/O where we used to call many small write(2)s.

* rs/more-buffered-io:
  upload-pack: use buffered I/O to talk to rev-list
  midx: use buffered I/O to talk to pack-objects
  connected: use buffered I/O to talk to rev-list
2020-08-24 14:54:31 -07:00
Junio C Hamano f577d305c7 Merge branch 'rs/upload-pack-sigchain-fix'
Code clean-up.

* rs/upload-pack-sigchain-fix:
  upload-pack: remove superfluous sigchain_pop() call
2020-08-19 16:14:45 -07:00
René Scharfe a698d67b08 upload-pack: use buffered I/O to talk to rev-list
Like f0bca72dc7 (send-pack: use buffered I/O to talk to pack-objects,
2016-06-08), significantly reduce the number of system calls and
simplify the code for sending object IDs to rev-list by using stdio's
buffering.

Take care to handle errors immediately to get the correct error code,
and to flush the buffer explicitly before closing the stream in order to
catch any write errors for these last bytes.

Helped-by: Chris Torek <chris.torek@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 10:29:39 -07:00
Junio C Hamano 73a9255166 Merge branch 'tb/upload-pack-filters'
The component to respond to "git fetch" request is made more
configurable to selectively allow or reject object filtering
specification used for partial cloning.

* tb/upload-pack-filters:
  t5616: use test_i18ngrep for upload-pack errors
  upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
  upload-pack.c: allow banning certain object filter(s)
  list_objects_filter_options: introduce 'list_object_filter_config_name'
2020-08-11 18:04:13 -07:00
René Scharfe e767963ab6 upload-pack: remove superfluous sigchain_pop() call
2997178ee6 (upload-pack: split check_unreachable() in two, prep for
get_reachable_list(), 2016-06-12) moved most code of has_unreachable()
into the new function do_reachable_revlist().  The latter takes care to
ignore SIGPIPE during its operations, and restores the original signal
handler before returning.

However, a sigchain_pop(SIGPIPE) call remained in the error handling
code of has_unreachable(), which does nothing because the stack is
empty after do_reachable_revlist() cleaned up after itself.  Remove it.

Signed-off-by: René Scharfe <l.s.r@web.de>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-11 14:01:18 -07:00
Junio C Hamano 46b225f153 Merge branch 'jk/strvec'
The argv_array API is useful for not just managing argv but any
"vector" (NULL-terminated array) of strings, and has seen adoption
to a certain degree.  It has been renamed to "strvec" to reduce the
barrier to adoption.

* jk/strvec:
  strvec: rename struct fields
  strvec: drop argv_array compatibility layer
  strvec: update documention to avoid argv_array
  strvec: fix indentation in renamed calls
  strvec: convert remaining callers away from argv_array name
  strvec: convert more callers away from argv_array name
  strvec: convert builtin/ callers away from argv_array name
  quote: rename sq_dequote_to_argv_array to mention strvec
  strvec: rename files from argv-array to strvec
  argv-array: rename to strvec
  argv-array: use size_t for count and alloc
2020-08-10 10:23:57 -07:00
Taylor Blau 5b01a4e8ff upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
In b79cf959b2 (upload-pack.c: allow banning certain object filter(s),
2020-02-26), we introduced functionality to disallow certain object
filters from being chosen from within 'git upload-pack'. Traditionally,
administrators use this functionality to disallow filters that are known
to perform slowly, for e.g., those that do not have bitmap-level
filtering.

In the past, the '--filter=tree:<n>' was one such filter that does not
have bitmap-level filtering support, and so was likely to be banned by
administrators.

However, in the previous couple of commits, we introduced bitmap-level
filtering for the case when 'n' is equal to '0', i.e., as if we had a
'--filter=tree:none' choice.

While it would be sufficient to simply write

  $ git config uploadpackfilter.tree.allow true

(since it would allow all values of 'n'), we would like to be able to
allow this filter for certain values of 'n', i.e., those no greater than
some pre-specified maximum.

In order to do this, introduce a new configuration key, as follows:

  $ git config uploadpackfilter.tree.maxDepth <m>

where '<m>' specifies the maximum allowed value of 'n' in the filter
'tree:n'. Administrators who wish to allow for only the value '0' can
write:

  $ git config uploadpackfilter.tree.allow true
  $ git config uploadpackfilter.tree.maxDepth 0

which allows '--filter=tree:0', but no other values.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 18:03:46 -07:00
Taylor Blau 6dd3456a8c upload-pack.c: allow banning certain object filter(s)
Git clients may ask the server for a partial set of objects, where the
set of objects being requested is refined by one or more object filters.
Server administrators can configure 'git upload-pack' to allow or ban
these filters by setting the 'uploadpack.allowFilter' variable to
'true' or 'false', respectively.

However, administrators using bitmaps may wish to allow certain kinds of
object filters, but ban others. Specifically, they may wish to allow
object filters that can be optimized by the use of bitmaps, while
rejecting other object filters which aren't and represent a perceived
performance degradation (as well as an increased load factor on the
server).

Allow configuring 'git upload-pack' to support object filters on a
case-by-case basis by introducing two new configuration variables:

  - 'uploadpackfilter.allow'
  - 'uploadpackfilter.<kind>.allow'

where '<kind>' may be one of 'blobNone', 'blobLimit', 'tree', and so on.

Setting the second configuration variable for any valid value of
'<kind>' explicitly allows or disallows restricting that kind of object
filter.

If a client requests the object filter <kind> and the respective
configuration value is not set, 'git upload-pack' will default to the
value of 'uploadpackfilter.allow', which itself defaults to 'true' to
maintain backwards compatibility. Note that this differs from
'uploadpack.allowfilter', which controls whether or not the 'filter'
capability is advertised.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 18:03:41 -07:00
Jeff King d70a9eb611 strvec: rename struct fields
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").

Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 19:18:06 -07:00
Jeff King f6d8942b1f strvec: fix indentation in renamed calls
Code which split an argv_array call across multiple lines, like:

  argv_array_pushl(&args, "one argument",
                   "another argument", "and more",
		   NULL);

was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:

  strvec_pushl(&args, "one argument",
                   "another argument", "and more",
		   NULL);

Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:

  git jump grep 'strvec_.*,$'

and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King c972bf4cf5 strvec: convert remaining callers away from argv_array name
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).

This patch converts all of the remaining files, as the resulting diff is
reasonably sized.

The conversion was done purely mechanically with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe '
    s/ARGV_ARRAY/STRVEC/g;
    s/argv_array/strvec/g;
  '

We'll deal with any indentation/style fallouts separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King dbbcd44fb4 strvec: rename files from argv-array to strvec
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe 's/argv-array.h/strvec.h/'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:17 -07:00
Jonathan Tan 77aa0941ce upload-pack: do not lazy-fetch "have" objects
When upload-pack receives a request containing "have" hashes, it (among
other things) checks if the served repository has the corresponding
objects. However, it does not do so with the
OBJECT_INFO_SKIP_FETCH_OBJECT flag, so if serving a partial clone, a
lazy fetch will be triggered first.

This was discovered at $DAYJOB when a user fetched from a partial clone
(into another partial clone - although this would also happen if the
repo to be fetched into is not a partial clone).

Therefore, whenever "have" hashes are checked for existence, pass the
OBJECT_INFO_SKIP_FETCH_OBJECT flag. Also add the OBJECT_INFO_QUICK flag
to improve performance, as it is typical that such objects do not exist
in the serving repo, and the consequences of a false negative are minor
(usually, a slightly larger pack sent).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:07:19 -07:00
Junio C Hamano 12210859da Merge branch 'bc/sha-256-part-2'
SHA-256 migration work continues.

* bc/sha-256-part-2: (44 commits)
  remote-testgit: adapt for object-format
  bundle: detect hash algorithm when reading refs
  t5300: pass --object-format to git index-pack
  t5704: send object-format capability with SHA-256
  t5703: use object-format serve option
  t5702: offer an object-format capability in the test
  t/helper: initialize the repository for test-sha1-array
  remote-curl: avoid truncating refs with ls-remote
  t1050: pass algorithm to index-pack when outside repo
  builtin/index-pack: add option to specify hash algorithm
  remote-curl: detect algorithm for dumb HTTP by size
  builtin/ls-remote: initialize repository based on fetch
  t5500: make hash independent
  serve: advertise object-format capability for protocol v2
  connect: parse v2 refs with correct hash algorithm
  connect: pass full packet reader when parsing v2 refs
  Documentation/technical: document object-format for protocol v2
  t1302: expect repo format version 1 for SHA-256
  builtin/show-index: provide options to determine hash algo
  t5302: modernize test formatting
  ...
2020-07-06 22:09:13 -07:00
Junio C Hamano 34e849b05a Merge branch 'jt/cdn-offload'
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.

* jt/cdn-offload:
  upload-pack: fix a sparse '0 as NULL pointer' warning
  upload-pack: send part of packfile response as uri
  fetch-pack: support more than one pack lockfile
  upload-pack: refactor reading of pack-objects out
  Documentation: add Packfile URIs design doc
  Documentation: order protocol v2 sections
  http-fetch: support fetching packfiles by URL
  http-fetch: refactor into function
  http: refactor finish_http_pack_request()
  http: use --stdin when indexing dumb HTTP pack
2020-06-25 12:27:47 -07:00
Ramsay Jones cae2ee1055 upload-pack: fix a sparse '0 as NULL pointer' warning
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17 13:22:40 -07:00
Christian Couder ea2c6e6083 upload-pack: refactor common code into do_got_oid()
As 'upload-pack.c' is now using 'struct upload_pack_data'
thoroughly, let's refactor some common code into a new
do_got_oid() function.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:35 -07:00
Christian Couder f01c7916b8 upload-pack: move oldest_have to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'oldest_have' static variable
into this struct.

It is used by both protocol v0 and protocol v2 code.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:35 -07:00
Christian Couder 460ed0d4b4 upload-pack: pass upload_pack_data to got_oid()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to got_oid(), so that
this function can use all the fields of the struct.

This will be used in followup commits to move a static variable
into 'upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:35 -07:00
Christian Couder 0866734820 upload-pack: pass upload_pack_data to ok_to_give_up()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to ok_to_give_up(), so
that this function can use all the fields of the struct.

This will be used in followup commits to move a static variable
into 'upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:35 -07:00
Christian Couder 6fbbc4374f upload-pack: pass upload_pack_data to send_acks()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to send_acks(), so
that this function can use all the fields of the struct.

This will be used in followup commits to move a static variable
into 'upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:35 -07:00
Christian Couder 8dcf22785f upload-pack: pass upload_pack_data to process_haves()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to process_haves(), so
that this function can use all the fields of the struct.

This will be used in followup commits to move a static variable
into 'upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:35 -07:00
Christian Couder 629060d9bb upload-pack: change allow_unadvertised_object_request to an enum
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's change allow_unadvertised_object_request,
which is now part of 'upload_pack_data', from an 'unsigned int'
to an enum.

This will make it clear which values this variable can take.

While at it let's change this variable name to 'allow_uor' to
make it shorter.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:35 -07:00
Christian Couder f1514c6aad upload-pack: move allow_unadvertised_object_request to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'allow_unadvertised_object_request'
static variable into this struct.

It is used by code common to protocol v0 and protocol v2.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:35 -07:00
Christian Couder de0e9f7498 upload-pack: move extra_edge_obj to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'extra_edge_obj' static variable
into this struct.

It is used by code common to protocol v0 and protocol v2.

While at it let's properly initialize and clear 'extra_edge_obj'
in the appropriate 'upload_pack_data' initialization and
clearing functions.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:34 -07:00
Christian Couder 35b43a107c upload-pack: move shallow_nr to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'shallow_nr' static variable
into this struct.

It is used by code common to protocol v0 and protocol v2.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:34 -07:00
Christian Couder 329f996050 upload-pack: pass upload_pack_data to send_unshallow()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to send_unshallow(), so
that this function can use all the fields of the struct.

This will be used in followup commits to move static variables
into 'upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:34 -07:00
Christian Couder 446e42c559 upload-pack: pass upload_pack_data to deepen_by_rev_list()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to deepen_by_rev_list(),
so that this function can use all the fields of the struct.

This will be used in followup commits to move static variables
into 'upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:34 -07:00
Christian Couder b1492f22f0 upload-pack: pass upload_pack_data to deepen()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to deepen(), so that
this function can use all the fields of the struct.

This will be used in followup commits to move static variables
into 'upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:34 -07:00
Christian Couder ee703c8a43 upload-pack: pass upload_pack_data to send_shallow_list()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to send_shallow_list(),
so that this function can use all the fields of the struct.

This will be used in followup commits to move static variables
into 'upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-11 13:35:34 -07:00
Jonathan Tan dd4b732df7 upload-pack: send part of packfile response as uri
Teach upload-pack to send part of its packfile response as URIs.

An administrator may configure a repository with one or more
"uploadpack.blobpackfileuri" lines, each line containing an OID, a pack
hash, and a URI. A client may configure fetch.uriprotocols to be a
comma-separated list of protocols that it is willing to use to fetch
additional packfiles - this list will be sent to the server. Whenever an
object with one of those OIDs would appear in the packfile transmitted
by upload-pack, the server may exclude that object, and instead send the
URI. The client will then download the packs referred to by those URIs
before performing the connectivity check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 18:06:34 -07:00
Jonathan Tan acaaca7d70 upload-pack: refactor reading of pack-objects out
Subsequent patches will change how the output of pack-objects is
processed, so extract that processing into its own function.

Currently, at most 1 character can be buffered (in the "buffered" local
variable). One of those patches will require a larger buffer, so replace
that "buffered" local variable with a buffer array.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 18:06:34 -07:00
Christian Couder 339a9840ef upload-pack: move pack_objects_hook to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'pack_objects_hook' static
variable into this struct.

It is used by code common to protocol v0 and protocol v2.

While at it let's also free() it in upload_pack_data_clear().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:27 -07:00
Christian Couder e3835cd4bc upload-pack: move allow_sideband_all to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'allow_sideband_all' static
variable into this struct.

It is used only by protocol v2 code.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:27 -07:00
Christian Couder d1d7a94526 upload-pack: move allow_ref_in_want to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'allow_ref_in_want' static
variable into this struct.

It is used only by protocol v2 code.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:27 -07:00
Christian Couder 59abe19624 upload-pack: move allow_filter to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'allow_filter' static variable
into this struct.

It is used by both protocol v0 and protocol v2 code.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:26 -07:00
Christian Couder f203a88cf1 upload-pack: move keepalive to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'keepalive' static variable
into this struct.

It is used by code common to protocol v0 and protocol v2.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:26 -07:00
Christian Couder 8a0e6f16ca upload-pack: pass upload_pack_data to upload_pack_config()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to upload_pack_config(),
so that this function can use all the fields of the struct.

This will be used in followup commits to move static variables
that are set in upload_pack_config() into 'upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:26 -07:00
Christian Couder e9d882b81e upload-pack: change multi_ack to an enum
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's take this opportunity to change the
'multi_ack' variable, which is now part of 'upload_pack_data',
to an enum.

This will make it clear which values this variable can take.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:26 -07:00
Christian Couder 53d69506c1 upload-pack: move multi_ack to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the multi_ack static variable into
this struct.

It is only used by protocol v0 code since protocol v2 assumes
certain baseline capabilities, but rolling it into
upload_pack_data and just letting v2 code ignore it as it does
now is more coherent and cleaner.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:26 -07:00
Christian Couder 59a902612a upload-pack: move filter_capability_requested to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the filter_capability_requested
static variable into this struct.

It is only used by protocol v0 code since protocol v2 assumes
certain baseline capabilities, but rolling it into
upload_pack_data and just letting v2 code ignore it as it does
now is more coherent and cleaner.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:26 -07:00
Christian Couder f8edd1ca3c upload-pack: move use_sideband to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'use_sideband' static variable
into this struct.

This variable is used by both v0 and v2 protocols.

While at it, let's update the comment near the variable
definition.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:25 -07:00
Christian Couder d40f04e0b0 upload-pack: move static vars to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the 'no_done', 'daemon_mode' and
'timeout' variables into this struct.

They are only used by protocol v0 code since protocol v2 assumes
certain baseline capabilities, but rolling them into
upload_pack_data and just letting v2 code ignore them as it does
now is more coherent and cleaner.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:25 -07:00
Christian Couder a849728821 upload-pack: annotate upload_pack_data fields
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's annotate fields from this struct to let
people know which ones are used only for protocol v0 and which
ones only for protocol v2.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:25 -07:00
Jeff King b5a2068cb1 upload-pack: actually use some upload_pack_data bitfields
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's actually start using some bitfields of
that struct. These bitfields were introduced in 3145ea957d
("upload-pack: introduce fetch server command", 2018-03-15), but
were never used.

We could instead have just removed the following bitfields
from the struct:

unsigned use_thin_pack : 1;
unsigned use_ofs_delta : 1;
unsigned no_progress : 1;
unsigned use_include_tag : 1;

but using them makes it possible to remove a number of static
variables with the same name and purpose from 'upload-pack.c'.

This is a behavior change, as we accidentally used to let values
in those bitfields propagate from one v2 "fetch" command to
another for ssh/git/file connections (but not for http). That's
fixing a bug, but one nobody is likely to see, because it would
imply the client sending different capabilities for each request.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 10:58:24 -07:00
brian m. carlson bf30dbf826 remote: advertise the object-format capability on the server side
Advertise the current hash algorithm in use by using the object-format
capability as part of the ref advertisement.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27 10:07:06 -07:00
Christian Couder 7a516764a3 upload-pack: use upload_pack_data fields in receive_needs()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's use fields from this struct in
receive_needs(), instead of local variables with the same name
and purpose.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:02 -07:00
Christian Couder c9f0325905 upload-pack: pass upload_pack_data to create_pack_file()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to create_pack_file(),
so that this function, and the function it calls, can use all
the fields of the struct.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:02 -07:00
Christian Couder df654abccc upload-pack: remove static variable 'stateless_rpc'
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's remove the 'stateless_rpc' static
variable, as we can now use the field of 'struct upload_pack_data'
with the same name instead.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:02 -07:00
Christian Couder b08c97423f upload-pack: pass upload_pack_data to check_non_tip()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to check_non_tip(), so
that this function and the functions it calls, can use all the
fields of the struct in followup commits.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:02 -07:00
Christian Couder 762f92767c upload-pack: pass upload_pack_data to send_ref()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass that struct to send_ref(), so that
this function, and the functions it calls, can use all the
fields of the struct in followup commits.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:02 -07:00
Christian Couder 438528f611 upload-pack: move symref to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, we are passing around that struct to many
functions, so let's also pass 'struct string_list symref' around
at the same time by moving it from a local variable in
upload_pack() into a field of 'struct upload_pack_data'.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:01 -07:00
Christian Couder 4ace0283b7 upload-pack: use upload_pack_data writer in receive_needs()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's use the 'struct packet_writer writer'
field from 'struct upload_pack_data' in receive_needs(),
instead of a local 'struct packet_writer writer' variable.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:01 -07:00
Christian Couder d92ae2c090 upload-pack: pass upload_pack_data to receive_needs()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass 'struct upload_pack_data' to
receive_needs(), so that this function and the functions it
calls can use all the fields of that struct in followup commits.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:01 -07:00
Christian Couder 0797769505 upload-pack: pass upload_pack_data to get_common_commits()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's pass 'struct upload_pack_data' to
get_common_commits(), so that this function and the functions
it calls can use all the fields of that struct in followup
commits.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:01 -07:00
Christian Couder ebf8ebcc56 upload-pack: use 'struct upload_pack_data' in upload_pack()
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's use 'struct upload_pack_data' in
upload_pack().

This will make it possible in followup commits to remove a lot
of static variables and local variables that have the same name
and purpose as fields in 'struct upload_pack_data'. This will
also make upload_pack() work in a more similar way as
upload_pack_v2().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:01 -07:00
Christian Couder e8498322fa upload-pack: move 'struct upload_pack_data' around
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move 'struct upload_pack_data' and the
related upload_pack_data_init() and upload_pack_data_clear()
functions towards the beginning of the file, so that this struct
and its related functions can then be used by upload_pack() in a
followup commit.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:01 -07:00
Christian Couder 389f161ab2 upload-pack: move {want,have}_obj to upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's move the want_obj and have_obj object
arrays into 'struct upload_pack_data'.

These object arrays are used by both upload_pack() and
upload_pack_v2(), for example when these functions call
create_pack_file(). We are going to use
'struct upload_pack_data' in upload_pack() in a followup
commit.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:01 -07:00
Christian Couder 06c7a43296 upload-pack: remove unused 'wants' from upload_pack_data
As we cleanup 'upload-pack.c' by using 'struct upload_pack_data'
more thoroughly, let's remove 'struct object_array wants' from
'struct upload_pack_data', as it appears to be unused.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:58:01 -07:00
Junio C Hamano a0125885f5 Merge branch 'cc/upload-pack-v2-fetch-fix'
Serving a "git fetch" client over "git://" and "ssh://" protocols
using the on-wire protocol version 2 was buggy on the server end
when the client needs to make a follow-up request to
e.g. auto-follow tags.

* cc/upload-pack-v2-fetch-fix:
  upload-pack: clear filter_options for each v2 fetch command
2020-05-13 12:19:21 -07:00
Junio C Hamano 896833b268 Merge branch 'tb/shallow-cleanup'
Code cleanup.

* tb/shallow-cleanup:
  shallow: use struct 'shallow_lock' for additional safety
  shallow.h: document '{commit,rollback}_shallow_file'
  shallow: extract a header file for shallow-related functions
  commit: make 'commit_graft_pos' non-static
2020-05-13 12:19:18 -07:00
Christian Couder 08450ef791 upload-pack: clear filter_options for each v2 fetch command
Because of the request/response model of protocol v2, the
upload_pack_v2() function is sometimes called twice in the same
process, while 'struct list_objects_filter_options filter_options'
was declared as static at the beginning of 'upload-pack.c'.

This made the check in list_objects_filter_die_if_populated(), which
is called by process_args(), fail the second time upload_pack_v2() is
called, as filter_options had already been populated the first time.

To fix that, filter_options is not static any more. It's now owned
directly by upload_pack(). It's now also part of 'struct
upload_pack_data', so that it's owned indirectly by upload_pack_v2().

In the long term, the goal is to also have upload_pack() use
'struct upload_pack_data', so adding filter_options to this struct
makes more sense than to have it owned directly by upload_pack_v2().

This fixes the first of the 2 bugs documented by d0badf8797
(partial-clone: demonstrate bugs in partial fetch, 2020-02-21).

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-08 11:07:27 -07:00
Taylor Blau 120ad2b0f1 shallow: extract a header file for shallow-related functions
There are many functions in commit.h that are more related to shallow
repositories than they are to any sort of generic commit machinery.
Likely this began when there were only a few shallow-related functions,
and commit.h seemed a reasonable enough place to put them.

But, now there are a good number of shallow-related functions, and
placing them all in 'commit.h' doesn't make sense.

This patch extracts a 'shallow.h', which takes all of the declarations
from 'commit.h' for functions which already exist in 'shallow.c'. We
will bring the remaining shallow-related functions defined in 'commit.c'
in a subsequent patch.

For now, move only the ones that already are implemented in 'shallow.c',
and update the necessary includes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30 14:19:13 -07:00
Jeff King 4845b77245 upload-pack: handle unexpected delim packets
When processing the arguments list for a v2 ls-refs or fetch command, we
loop like this:

  while (packet_reader_read(request) != PACKET_READ_FLUSH) {
          const char *arg = request->line;
	  ...handle arg...
  }

to read and handle packets until we see a flush. The hidden assumption
here is that anything except PACKET_READ_FLUSH will give us valid packet
data to read. But that's not true; PACKET_READ_DELIM or PACKET_READ_EOF
will leave packet->line as NULL, and we'll segfault trying to look at
it.

Instead, we should follow the more careful model demonstrated on the
client side (e.g., in process_capabilities_v2): keep looping as long
as we get normal packets, and then make sure that we broke out of the
loop due to a real flush. That fixes the segfault and correctly
diagnoses any unexpected input from the client.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 12:18:48 -07:00
Matthew Rogers 6dc905d974 config: split repo scope to local and worktree
Previously when iterating through git config variables, worktree config
and local config were both considered "CONFIG_SCOPE_REPO".  This was
never a problem before as no one had needed to differentiate between the
two cases, but future functionality may care whether or not the config
options come from a worktree or from the repository's actual local
config file.  For example, the planned feature to add a '--show-scope'
to config to allow a user to see which scope listed config options come
from would confuse users if it just printed 'repo' rather than 'local'
or 'worktree' as the documentation would lead them to expect.  As well
as the additional benefit of making the implementation look more like
how the documentation describes the interface.

To accomplish this we split out what was previously considered repo
scope to be local and worktree.

The clients of 'current_config_scope()' who cared about
CONFIG_SCOPE_REPO are also modified to similarly care about
CONFIG_SCOPE_WORKTREE and CONFIG_SCOPE_LOCAL to preserve previous behavior.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:32:20 -08:00
Junio C Hamano 098e8c6716 Merge branch 'jk/disable-commit-graph-during-upload-pack'
The "upload-pack" (the counterpart of "git fetch") needs to disable
commit-graph when responding to a shallow clone/fetch request, but
the way this was done made Git panic, which has been corrected.

* jk/disable-commit-graph-during-upload-pack:
  upload-pack: disable commit graph more gently for shallow traversal
  commit-graph: bump DIE_ON_LOAD check to actual load-time
2019-10-07 11:32:55 +09:00
Junio C Hamano 627b826834 Merge branch 'md/list-objects-filter-combo'
The list-objects-filter API (used to create a sparse/lazy clone)
learned to take a combined filter specification.

* md/list-objects-filter-combo:
  list-objects-filter-options: make parser void
  list-objects-filter-options: clean up use of ALLOC_GROW
  list-objects-filter-options: allow mult. --filter
  strbuf: give URL-encoding API a char predicate fn
  list-objects-filter-options: make filter_spec a string_list
  list-objects-filter-options: move error check up
  list-objects-filter: implement composite filters
  list-objects-filter-options: always supply *errbuf
  list-objects-filter: put omits set in filter struct
  list-objects-filter: encapsulate filter components
2019-09-18 11:50:09 -07:00
Jeff King 6abada1880 upload-pack: disable commit graph more gently for shallow traversal
When the client has asked for certain shallow options like
"deepen-since", we do a custom rev-list walk that pretends to be
shallow. Before doing so, we have to disable the commit-graph, since it
is not compatible with the shallow view of the repository. That's
handled by 829a321569 (commit-graph: close_commit_graph before shallow
walk, 2018-08-20). That commit literally closes and frees our
repo->objects->commit_graph struct.

That creates an interesting problem for commits that have _already_ been
parsed using the commit graph. Their commit->object.parsed flag is set,
their commit->graph_pos is set, but their commit->maybe_tree may still
be NULL. When somebody later calls repo_get_commit_tree(), we see that
we haven't loaded the tree oid yet and try to get it from the commit
graph. But since it has been freed, we segfault!

So the root of the issue is a data dependency between the commit's
lazy-load of the tree oid and the fact that the commit graph can go
away mid-process. How can we resolve it?

There are a couple of general approaches:

  1. The obvious answer is to avoid loading the tree from the graph when
     we see that it's NULL. But then what do we return for the tree oid?
     If we return NULL, our caller in do_traverse() will rightly
     complain that we have no tree. We'd have to fallback to loading the
     actual commit object and re-parsing it. That requires teaching
     parse_commit_buffer() to understand re-parsing (i.e., not starting
     from a clean slate and not leaking any allocated bits like parent
     list pointers).

  2. When we close the commit graph, walk through the set of in-memory
     objects and clear any graph_pos pointers. But this means we also
     have to "unparse" any such commits so that we know they still need
     to open the commit object to fill in their trees. So it's no less
     complicated than (1), and is more expensive (since we clear objects
     we might not later need).

  3. Stop freeing the commit-graph struct. Continue to let it be used
     for lazy-loads of tree oids, but let upload-pack specify that it
     shouldn't be used for further commit parsing.

  4. Push the whole shallow rev-list out to its own sub-process, with
     the commit-graph disabled from the start, giving it a clean memory
     space to work from.

I've chosen (3) here. Options (1) and (2) would work, but are
non-trivial to implement. Option (4) is more expensive, and I'm not sure
how complicated it is (shelling out for the actual rev-list part is
easy, but we do then parse the resulting commits internally, and I'm not
clear which parts need to be handling shallow-ness).

The new test in t5500 triggers this segfault, but see the comments there
for how horribly intimate it has to be with how both upload-pack and
commit graphs work.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-12 12:30:08 -07:00
Junio C Hamano a7db4c193d Merge branch 'jk/oidhash'
Code clean-up to remove hardcoded SHA-1 hash from many places.

* jk/oidhash:
  hashmap: convert sha1hash() to oidhash()
  hash.h: move object_id definition from cache.h
  khash: rename oid helper functions
  khash: drop sha1-specific map types
  pack-bitmap: convert khash_sha1 maps into kh_oid_map
  delta-islands: convert island_marks khash to use oids
  khash: rename kh_oid_t to kh_oid_set
  khash: drop broken oid_map typedef
  object: convert create_object() to use object_id
  object: convert internal hash_obj() to object_id
  object: convert lookup_object() to use object_id
  object: convert lookup_unknown_object() to use object_id
  pack-objects: convert locate_object_entry_hash() to object_id
  pack-objects: convert packlist_find() to use object_id
  pack-bitmap-write: convert some helpers to use object_id
  upload-pack: rename a "sha1" variable to "oid"
  describe: fix accidental oid/hash type-punning
2019-07-09 15:25:43 -07:00
Junio C Hamano 5cb7c73589 Merge branch 'ds/close-object-store'
The commit-graph file is now part of the "files that the runtime
may keep open file descriptors on, all of which would need to be
closed when done with the object store", and the file descriptor to
an existing commit-graph file now is closed before "gc" finalizes a
new instance to replace it.

* ds/close-object-store:
  packfile: rename close_all_packs to close_object_store
  packfile: close commit-graph in close_all_packs
  commit-graph: use raw_object_store when closing
2019-07-09 15:25:37 -07:00
Matthew DeVore 489fc9ee71 list-objects-filter-options: allow mult. --filter
Allow combining of multiple filters by simply repeating the --filter
flag. Before this patch, the user had to combine them in a single flag
somewhat awkwardly (e.g. --filter=combine:FOO+BAR), including
URL-encoding the individual filters.

To make this work, in the --filter flag parsing callback, rather than
error out when we detect that the filter_options struct is already
populated, we modify it in-place to contain the added sub-filter. The
existing sub-filter becomes the lhs of the combined filter, and the
next sub-filter becomes the rhs. We also have to URL-encode the LHS and
RHS sub-filters.

We can simplify the operation if the LHS is already a combine: filter.
In that case, we just append the URL-encoded RHS sub-filter to the LHS
spec to get the new spec.

Helped-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Matthew DeVore cf9ceb5a12 list-objects-filter-options: make filter_spec a string_list
Make the filter_spec string a string_list rather than a raw C string.
The list of strings must be concatted together to make a complete
filter_spec. A future patch will use this capability to build "combine:"
filter specs gradually.

A strbuf would seem to be a more natural choice for this object, but it
unfortunately requires initialization besides just zero'ing out the
memory.  This results in all container structs, and all containers of
those structs, etc., to also require initialization. Initializing them
all would be more cumbersome that simply using a string_list, which
behaves properly when its contents are zero'd.

For the purposes of code simplification, change behavior in how filter
specs are conveyed over the protocol: do not normalize the tree:<depth>
filter specs since there should be no server in existence that supports
tree:# but not tree:#k etc.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Jeff King d0229abd93 object: convert lookup_object() to use object_id
There are no callers left of lookup_object() that aren't just passing us
the "hash" member of a "struct object_id". Let's take the whole struct,
which gets us closer to removing all raw sha1 variables.  It also
matches the existing conversions of lookup_blob(), etc.

The conversions of callers were done by hand, but they're all mechanical
one-liners.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 10:18:09 -07:00
Jeff King 0ebbcf70e6 object: convert lookup_unknown_object() to use object_id
There are no callers left of lookup_unknown_object() that aren't just
passing us the "hash" member of a "struct object_id". Let's take the
whole struct, which gets us closer to removing all raw sha1 variables.
It also matches the existing conversions of lookup_blob(), etc.

The conversions of callers were done by hand, but they're all mechanical
one-liners.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 10:06:19 -07:00
Jeff King 62b89d43e2 upload-pack: rename a "sha1" variable to "oid"
This variable is a "struct object_id", but uses the old-style name
"sha1". Let's call it oid to match more modern code (and make it clear
that it can handle any algorithm, since it uses parse_oid_hex()
properly).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 09:32:54 -07:00
Junio C Hamano 86d87307c1 Merge branch 'jk/HEAD-symref-in-xfer-namespaces'
The server side support for "git fetch" used to show incorrect
value for the HEAD symbolic ref when the namespace feature is in
use, which has been corrected.

* jk/HEAD-symref-in-xfer-namespaces:
  upload-pack: strip namespace from symref data
2019-06-17 10:15:15 -07:00
Derrick Stolee c3a3a964b2 commit-graph: use raw_object_store when closing
The close_commit_graph() method took a repository struct, but then
only uses the raw_object_store within. Change the function prototype
to make the method more flexible.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:33:54 -07:00
Jeff King 533e088250 upload-pack: strip namespace from symref data
Since 7171d8c15f (upload-pack: send symbolic ref information as
capability, 2013-09-17), we've sent cloning and fetching clients special
information about which branch HEAD is pointing to, so that they don't
have to guess based on matching up commit ids.

However, this feature has never worked properly with the GIT_NAMESPACE
feature.  Because upload-pack uses head_ref_namespaced(find_symref), we
do find and report on refs/namespaces/foo/HEAD instead of the actual
HEAD of the repo. This makes sense, since the branch pointed to by the
top-level HEAD may not be advertised at all. But we do two things wrong:

  1. We report the full name refs/namespaces/foo/HEAD, instead of just
     HEAD. Meaning no client is going to bother doing anything with that
     symref, since we're not otherwise advertising it.

  2. We report the symref destination using its full name (e.g.,
     refs/namespaces/foo/refs/heads/master). That's similarly useless to
     the client, who only saw "refs/heads/master" in the advertisement.

We should be stripping the namespace prefix off of both places (which
this patch fixes).

Likely nobody noticed because we tend to do the right thing anyway. Bug
(1) means that we said nothing about HEAD (just refs/namespace/foo/HEAD).
And so the client half of the code, from a45b5f0552 (connect: annotate
refs with their symref information in get_remote_head(), 2013-09-17),
does not annotate HEAD, and we use the fallback in guess_remote_head(),
matching refs by object id. Which is usually right. It only falls down
in ambiguous cases, like the one laid out in the included test.

This also means that we don't have to worry about breaking anybody who
was putting pre-stripped names into their namespace symrefs when we fix
bug (2). Because of bug (1), nobody would have been using the symref we
advertised in the first place (not to mention that those symrefs would
have appeared broken for any non-namespaced access).

Note that we have separate fixes here for the v0 and v2 protocols. The
symref advertisement moved in v2 to be a part of the ls-refs command.
This actually gets part (1) right, since the symref annotation
piggy-backs on the existing ref advertisement, which is properly
stripped. But it still needs a fix for part (2). The included tests
cover both protocols.

Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28 10:02:00 -07:00
Junio C Hamano 97616ca488 Merge branch 'en/unicode-in-refnames'
On a filesystem like HFS+, the names of the refs stored as filesystem
entities may become different from what the end-user expects, just
like files in the working tree get "renamed".  Work around the
mismatch by paying attention to the core.precomposeUnicode
configuration.

* en/unicode-in-refnames:
  Honor core.precomposeUnicode in more places
2019-05-19 16:45:30 +09:00