git/connected.h
Jonathan Tan 2b98478c6f connected: always use partial clone optimization
With 50033772d5 ("connected: verify promisor-ness of partial clone",
2020-01-30), the fast path (checking promisor packs) in
check_connected() now passes a subset of the slow path (rev-list) - if
all objects to be checked are found in promisor packs, both the fast
path and the slow path will pass; otherwise, the fast path will
definitely not pass. This means that we can always attempt the fast path
whenever we need to do the slow path.

The fast path is currently guarded by a flag; therefore, remove that
flag. Also, make the fast path fallback to the slow path - if the fast
path fails, the failing OID and all remaining OIDs will be passed to
rev-list.

The main user-visible benefit is the performance of fetch from a partial
clone - specifically, the speedup of the connectivity check done before
the fetch. In particular, a no-op fetch into a partial clone on my
computer was sped up from 7 seconds to 0.01 seconds. This is a
complement to the work in 2df1aa239c ("fetch: forgo full
connectivity check if --filter", 2020-01-30), which is the child of the
aforementioned 50033772d5. In that commit, the connectivity check
*after* the fetch was sped up.

The addition of the fast path might cause performance reductions in
these cases:

 - If a partial clone or a fetch into a partial clone fails, Git will
   fruitlessly run rev-list (it is expected that everything fetched
   would go into promisor packs, so if that didn't happen, it is most
   likely that rev-list will fail too).

 - Any connectivity checks done by receive-pack, in the (in my opinion,
   unlikely) event that a partial clone serves receive-pack.

I think that these cases are rare enough, and the performance reduction
in this case minor enough (additional object DB access), that the
benefit of avoiding a flag outweighs these.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 10:37:44 -07:00

66 lines
1.7 KiB
C

#ifndef CONNECTED_H
#define CONNECTED_H
struct object_id;
struct transport;
/*
* Take callback data, and return next object name in the buffer.
* When called after returning the name for the last object, return -1
* to signal EOF, otherwise return 0.
*/
typedef int (*oid_iterate_fn)(void *, struct object_id *oid);
/*
* Named-arguments struct for check_connected. All arguments are
* optional, and can be left to defaults as set by CHECK_CONNECTED_INIT.
*/
struct check_connected_options {
/* Avoid printing any errors to stderr. */
int quiet;
/* --shallow-file to pass to rev-list sub-process */
const char *shallow_file;
/* Transport whose objects we are checking, if available. */
struct transport *transport;
/*
* If non-zero, send error messages to this descriptor rather
* than stderr. The descriptor is closed before check_connected
* returns.
*/
int err_fd;
/* If non-zero, show progress as we traverse the objects. */
int progress;
/*
* Insert these variables into the environment of the child process.
*/
const char **env;
/*
* If non-zero, check the ancestry chain completely, not stopping at
* any existing ref. This is necessary when deepening existing refs
* during a fetch.
*/
unsigned is_deepening_fetch : 1;
};
#define CHECK_CONNECTED_INIT { 0 }
/*
* Make sure that all given objects and all objects reachable from them
* either exist in our object store or (if the repository is a partial
* clone) are promised to be available.
*
* Return 0 if Ok, non zero otherwise (i.e. some missing objects)
*
* If "opt" is NULL, behaves as if CHECK_CONNECTED_INIT was passed.
*/
int check_connected(oid_iterate_fn fn, void *cb_data,
struct check_connected_options *opt);
#endif /* CONNECTED_H */