Commit graph

12 commits

Author SHA1 Message Date
Junio C Hamano ee99ba7afb Merge branch 'jt/lazy-object-fetch-fix'
The code to backfill objects in lazily cloned repository did not
work correctly, which has been corrected.

* jt/lazy-object-fetch-fix:
  fetch-object: set exact_oid when fetching
  fetch-object: unify fetch_object[s] functions
2018-09-24 10:30:52 -07:00
Jonathan Tan e68302011c fetch-object: set exact_oid when fetching
fetch_objects() currently does not set exact_oid in struct ref when
invoking transport_fetch_refs(). If the server supports ref-in-want,
fetch_pack() uses this field to determine whether a wanted ref should be
requested as a "want-ref" line or a "want" line; without the setting of
exact_oid, the wrong line will be sent.

Set exact_oid, so that the correct line is sent.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-13 13:57:31 -07:00
Derrick Stolee 859fdc0c3c commit-graph: define GIT_TEST_COMMIT_GRAPH
The commit-graph feature is tested in isolation by
t5318-commit-graph.sh and t6600-test-reach.sh, but there are many
more interesting scenarios involving commit walks. Many of these
scenarios are covered by the existing test suite, but we need to
maintain coverage when the optional commit-graph structure is not
present.

To allow running the full test suite with the commit-graph present,
add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar
to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try
to load the commit-graph when parsing commits, and writes the
commit-graph file after every 'git commit' command.

There are a few tests that rely on commits not existing in
pack-files to trigger important events, so manually set
GIT_TEST_COMMIT_GRAPH to false for the necessary commands.

There is one test in t6024-recursive-merge.sh that relies on the
merge-base algorithm picking one of two ambiguous merge-bases, and
the commit-graph feature changes which merge-base is picked.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 10:44:31 -07:00
Jonathan Tan 5d19e8138d repack: repack promisor objects if -a or -A is set
Currently, repack does not touch promisor packfiles at all, potentially
causing the performance of repositories that have many such packfiles to
drop. Therefore, repack all promisor objects if invoked with -a or -A.

This is done by an additional invocation of pack-objects on all promisor
objects individually given, which takes care of deduplication and allows
the resulting packfiles to respect flags such as --max-pack-size.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-09 09:17:39 -07:00
Jonathan Tan cac1137dc4 list-objects: check if filter is NULL before using
In partial_clone_get_default_filter_spec(), the
core_partial_clone_filter_default variable may be NULL; ensure that it
is not NULL before using it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-12 10:46:56 -07:00
Jonathan Tan 0c16cd499d gc: do not repack promisor packfiles
Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Jonathan Tan df11e19648 rev-list: support termination at promisor objects
Teach rev-list to support termination of an object traversal at any
object from a promisor remote (whether one that the local repo also has,
or one that the local repo knows about because it has another promisor
object that references it).

This will be used subsequently in gc and in the connectivity check used
by fetch.

For efficiency, if an object is referenced by a promisor object, and is
in the local repo only as a non-promisor object, object traversal will
not stop there. This is to avoid building the list of promisor object
references.

(In list-objects.c, the case where obj is NULL in process_blob() and
process_tree() do not need to be changed because those happen only when
there is a conflict between the expected type and the existing object.
If the object doesn't exist, an object will be synthesized, which is
fine.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Jonathan Tan 8b4c0103a9 sha1_file: support lazily fetching missing objects
Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
     about the loose/packed distinction and operate on both differently,
     and ensure that they can handle the concept of objects that are
     neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
     - this searches a single pack that is provided as an argument; the
       caller already knows (through other means) that the sought object
       is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
     it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Jonathan Tan 096c9b8be9 fsck: support promisor objects as CLI argument
Teach fsck to not treat missing promisor objects provided on the CLI as
an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Jonathan Tan caba7fc31a fsck: support referenced promisor objects
Teach fsck to not treat missing promisor objects indirectly pointed to
by refs as an error when extensions.partialclone is set.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Jonathan Tan 43f25158ca fsck: support refs pointing to promisor objects
Teach fsck to not treat refs referring to missing promisor objects as an
error when extensions.partialclone is set.

For the purposes of warning about no default refs, such refs are still
treated as legitimate refs.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Jonathan Tan 498f1f61f1 fsck: introduce partialclone extension
Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage. In such an arrangement, the full set of objects is usually
available in remote storage, ready to be lazily downloaded.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing promisor objects referenced from the reflog are not an
error case; in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00