Commit graph

535 commits

Author SHA1 Message Date
Junio C Hamano 6d8f8ebb74 Merge branch 'ds/commit-graph-with-grafts'
The recently introduced commit-graph auxiliary data is incompatible
with mechanisms such as replace & grafts that "breaks" immutable
nature of the object reference relationship.  Disable optimizations
based on its use (and updating existing commit-graph) when these
incompatible features are in use in the repository.

* ds/commit-graph-with-grafts:
  commit-graph: close_commit_graph before shallow walk
  commit-graph: not compatible with uninitialized repo
  commit-graph: not compatible with grafts
  commit-graph: not compatible with replace objects
  test-repository: properly init repo
  commit-graph: update design document
  refs.c: upgrade for_each_replace_ref to be a each_repo_ref_fn callback
  refs.c: migrate internal ref iteration to pass thru repository argument
2018-10-16 16:15:59 +09:00
Pratik Karki 103148aad8 merge-base --fork-point: extract libified function
We need this functionality in the builtin rebase.

Note: to make this function truly reusable, we have to switch the call
get_merges_many_dirty() to get_merges_many() because we want the commit
flags to be reset (otherwise, subsequent get_merge_bases() calls would
obtain incorrect results). This did not matter when the function was
called in `git rev-parse --fork-point` because in that command, the
process definitely did not traverse any commits before exiting.

Signed-off-by: Pratik Karki <predatoramigo@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11 14:12:45 +09:00
Junio C Hamano 769af0fd9e Merge branch 'jk/cocci'
spatch transformation to replace boolean uses of !hashcmp() to
newly introduced oideq() is added, and applied, to regain
performance lost due to support of multiple hash algorithms.

* jk/cocci:
  show_dirstat: simplify same-content check
  read-cache: use oideq() in ce_compare functions
  convert hashmap comparison functions to oideq()
  convert "hashcmp() != 0" to "!hasheq()"
  convert "oidcmp() != 0" to "!oideq()"
  convert "hashcmp() == 0" to hasheq()
  convert "oidcmp() == 0" to oideq()
  introduce hasheq() and oideq()
  coccinelle: use <...> for function exclusion
2018-09-17 13:53:57 -07:00
Junio C Hamano fba9654364 Merge branch 'jk/trailer-fixes'
"git interpret-trailers" and its underlying machinery had a buggy
code that attempted to ignore patch text after commit log message,
which triggered in various codepaths that will always get the log
message alone and never get such an input.

* jk/trailer-fixes:
  append_signoff: use size_t for string offsets
  sequencer: ignore "---" divider when parsing trailers
  pretty, ref-filter: format %(trailers) with no_divider option
  interpret-trailers: allow suppressing "---" divider
  interpret-trailers: tighten check for "---" patch boundary
  trailer: pass process_trailer_opts to trailer_info_get()
  trailer: use size_t for iterating trailer list
  trailer: use size_t for string offsets
2018-09-17 13:53:54 -07:00
Junio C Hamano 1b7a91da71 Merge branch 'ds/reachable'
The code for computing history reachability has been shuffled,
obtained a bunch of new tests to cover them, and then being
improved.

* ds/reachable:
  commit-reach: correct accidental #include of C file
  commit-reach: use can_all_from_reach
  commit-reach: make can_all_from_reach... linear
  commit-reach: replace ref_newer logic
  test-reach: test commit_contains
  test-reach: test can_all_from_reach_with_flags
  test-reach: test reduce_heads
  test-reach: test get_merge_bases_many
  test-reach: test is_descendant_of
  test-reach: test in_merge_bases
  test-reach: create new test tool for ref_newer
  commit-reach: move can_all_from_reach_with_flags
  upload-pack: generalize commit date cutoff
  upload-pack: refactor ok_to_give_up()
  upload-pack: make reachable() more generic
  commit-reach: move commit_contains from ref-filter
  commit-reach: move ref_newer from remote.c
  commit.h: remove method declarations
  commit-reach: move walk methods from commit.c
2018-09-17 13:53:52 -07:00
Junio C Hamano 0a866db570 Merge branch 'ds/commit-graph-lockfile-fix'
"git merge-base" in 2.19-rc1 has performance regression when the
(experimental) commit-graph feature is in use, which has been
mitigated.

* ds/commit-graph-lockfile-fix:
  commit: don't use generation numbers if not needed
2018-09-04 14:31:39 -07:00
Derrick Stolee 091f4cf358 commit: don't use generation numbers if not needed
In 3afc679b "commit: use generations in paint_down_to_common()",
the queue in paint_down_to_common() was changed to use a priority
order based on generation number before commit date. This served
two purposes:

 1. When generation numbers are present, the walk guarantees
    correct topological relationships, regardless of clock skew in
    commit dates.

 2. It enables short-circuiting the walk when the min_generation
    parameter is added in d7c1ec3e "commit: add short-circuit to
    paint_down_to_common()". This short-circuit helps commands
    like 'git branch --contains' from needing to walk to a merge
    base when we know the result is false.

The commit message for 3afc679b includes the following sentence:

    This change does not affect the number of commits that are
    walked during the execution of paint_down_to_common(), only
    the order that those commits are inspected.

This statement is incorrect. Because it changes the order in which
the commits are inspected, it changes the order they are added to
the queue, and hence can change the number of loops before the
queue_has_nonstale() method returns true.

This change makes a concrete difference depending on the topology
of the commit graph. For instance, computing the merge-base between
consecutive versions of the Linux kernel has no effect for versions
after v4.9, but 'git merge-base v4.8 v4.9' presents a performance
regression:

    v2.18.0: 0.122s
v2.19.0-rc1: 0.547s
       HEAD: 0.127s

To determine that this was simply an ordering issue, I inserted
a counter within the while loop of paint_down_to_common() and
found that the loop runs 167,468 times in v2.18.0 and 635,579
times in v2.19.0-rc1.

The topology of this case can be described in a simplified way
here:

  v4.9
   |  \
   |   \
  v4.8  \
   | \   \
   |  \   |
  ...  A  B
   |  /  /
   | /  /
   |/__/
   C

Here, the "..." means "a very long line of commits". By generation
number, A and B have generation one more than C. However, A and B
have commit date higher than most of the commits reachable from
v4.8. When the walk reaches v4.8, we realize that it has PARENT1
and PARENT2 flags, so everything it can reach is marked as STALE,
including A. B has only the PARENT1 flag, so is not STALE.

When paint_down_to_common() is run using
compare_commits_by_commit_date, A and B are removed from the queue
early and C is inserted into the queue. At this point, C and the
rest of the queue entries are marked as STALE. The loop then
terminates.

When paint_down_to_common() is run using
compare_commits_by_gen_then_commit_date, B is removed from the
queue only after the many commits reachable from v4.8 are explored.
This causes the loop to run longer. The reason for this regression
is simple: the queue order is intended to not explore a commit
until everything that _could_ reach that commit is explored. From
the information gathered by the original ordering, we have no
guarantee that there is not a commit D reachable from v4.8 that
can also reach B. We gained absolute correctness in exchange for
a performance regression.

The performance regression is probably the worse option, since
these incorrect results in paint_down_to_common() are rare. The
topology required for the performance regression are less rare,
but still require multiple merge commits where the parents differ
greatly in generation number. In our example above, the commit A
is as important as the commit B to demonstrate the problem, since
otherwise the commit C will sit in the queue as non-stale just as
long in both orders.

The solution provided uses the min_generation parameter to decide
if we should use generation numbers in our ordering. When
min_generation is equal to zero, it means that the caller has no
known cutoff for the walk, so we should rely on our commit-date
heuristic as before; this is the case with merge_bases_many().
When min_generation is non-zero, then the caller knows a valuable
cutoff for the short-circuit mechanism; this is the case with
remove_redundant() and in_merge_bases_many().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-30 11:17:57 -07:00
Jeff King 9001dc2a74 convert "oidcmp() != 0" to "!oideq()"
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:

  if (oidcmp(E1, E2))

As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.

There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 11:32:49 -07:00
Junio C Hamano 1392c5d289 Merge branch 'js/larger-timestamps'
Portability fix.

* js/larger-timestamps:
  commit: use timestamp_t for author_date_slab
2018-08-27 14:33:44 -07:00
Jeff King 66e83d9b41 append_signoff: use size_t for string offsets
The append_signoff() function takes an "int" to specify the
number of bytes to ignore. Most callers just pass 0, and the
remainder use ignore_non_trailer() to skip over cruft.
That function also returns an int, and uses them internally.

On systems where size_t is larger than an int (i.e., most
64-bit systems), dealing with a ridiculously large commit
message could end up overflowing an int, producing
surprising results (e.g., returning a negative offset, which
would cause us to look outside the original string).

Let's consistently use size_t for these offsets through this
whole stack. As a bonus, this makes the meaning of
"ignore_footer" as an offset (and not a boolean) more clear.
But while we're here, let's also document the interface.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:08:51 -07:00
Derrick Stolee 1820703045 commit: use timestamp_t for author_date_slab
The author_date_slab is used to store the author date of a commit
when walking with the --author-date flag in rev-list or log. This
was added as an 'unsigned long' in

	81c6b38b "log: --author-date-order"

Since 'unsigned long' is ambiguous in its bit-ness across platforms
(64-bit in Linux, 32-bit in Windows, for example), most references
to the author dates in commit.c were converted to timestamp_t in

	dddbad72 "timestamp_t: a new data type for timestamps"

However, the slab definition was missed, leading to a mismatch in
the data types in Windows. This would not reveal itself as a bug
unless someone authors a commit after February 2106, but commits
can store anything as their author date.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 14:08:18 -07:00
Derrick Stolee 20fd6d5799 commit-graph: not compatible with grafts
Augment commit_graph_compatible(r) to return false when the given
repository r has commit grafts or is a shallow clone. Test that in these
situations we ignore existing commit-graph files and we do not write new
commit-graph files.

Helped-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 10:22:51 -07:00
Junio C Hamano 78a72ad4f8 Merge branch 'jt/commit-graph-per-object-store'
The singleton commit-graph in-core instance is made per in-core
repository instance.

* jt/commit-graph-per-object-store:
  commit-graph: add repo arg to graph readers
  commit-graph: store graph in struct object_store
  commit-graph: add free_commit_graph
  commit-graph: add missing forward declaration
  object-store: add missing include
  commit-graph: refactor preparing commit graph
2018-08-02 15:30:47 -07:00
Junio C Hamano 3a2a1dc170 Merge branch 'sb/object-store-lookup'
lookup_commit_reference() and friends have been updated to find
in-core object for a specific in-core repository instance.

* sb/object-store-lookup: (32 commits)
  commit.c: allow lookup_commit_reference to handle arbitrary repositories
  commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories
  tag.c: allow deref_tag to handle arbitrary repositories
  object.c: allow parse_object to handle arbitrary repositories
  object.c: allow parse_object_buffer to handle arbitrary repositories
  commit.c: allow get_cached_commit_buffer to handle arbitrary repositories
  commit.c: allow set_commit_buffer to handle arbitrary repositories
  commit.c: migrate the commit buffer to the parsed object store
  commit-slabs: remove realloc counter outside of slab struct
  commit.c: allow parse_commit_buffer to handle arbitrary repositories
  tag: allow parse_tag_buffer to handle arbitrary repositories
  tag: allow lookup_tag to handle arbitrary repositories
  commit: allow lookup_commit to handle arbitrary repositories
  tree: allow lookup_tree to handle arbitrary repositories
  blob: allow lookup_blob to handle arbitrary repositories
  object: allow lookup_object to handle arbitrary repositories
  object: allow object_as_type to handle arbitrary repositories
  tag: add repository argument to deref_tag
  tag: add repository argument to parse_tag_buffer
  tag: add repository argument to lookup_tag
  ...
2018-08-02 15:30:42 -07:00
Junio C Hamano b006f01ab5 Merge branch 'ds/commit-graph-fsck'
"git fsck" learns to make sure the optional commit-graph file is in
a sane state.

* ds/commit-graph-fsck: (23 commits)
  coccinelle: update commit.cocci
  commit-graph: update design document
  gc: automatically write commit-graph files
  commit-graph: add '--reachable' option
  commit-graph: use string-list API for input
  fsck: verify commit-graph
  commit-graph: verify contents match checksum
  commit-graph: test for corrupted octopus edge
  commit-graph: verify commit date
  commit-graph: verify generation number
  commit-graph: verify parent list
  commit-graph: verify root tree OIDs
  commit-graph: verify objects exist
  commit-graph: verify corrupt OID fanout and lookup
  commit-graph: verify required chunks are present
  commit-graph: verify catches corrupt signature
  commit-graph: add 'verify' subcommand
  commit-graph: load a root tree from specific graph
  commit: force commit to parse from object database
  commit-graph: parse commit from chosen graph
  ...
2018-08-02 15:30:40 -07:00
Junio C Hamano 37aac3e408 Merge branch 'bc/object-id'
Conversion from uchar[40] to struct object_id continues.

* bc/object-id:
  pretty: switch hard-coded constants to the_hash_algo
  sha1-file: convert constants to uses of the_hash_algo
  log-tree: switch GIT_SHA1_HEXSZ to the_hash_algo->hexsz
  diff: switch GIT_SHA1_HEXSZ to use the_hash_algo
  builtin/merge-recursive: make hash independent
  builtin/merge: switch to use the_hash_algo
  builtin/fmt-merge-msg: make hash independent
  builtin/update-index: simplify parsing of cacheinfo
  builtin/update-index: convert to using the_hash_algo
  refs/files-backend: use the_hash_algo for writing refs
  sha1-name: use the_hash_algo when parsing object names
  strbuf: allocate space with GIT_MAX_HEXSZ
  commit: express tree entry constants in terms of the_hash_algo
  hex: switch to using the_hash_algo
  tree-walk: replace hard-coded constants with the_hash_algo
  cache: update object ID functions for the_hash_algo
2018-08-02 15:30:39 -07:00
Derrick Stolee 5227c38566 commit-reach: move walk methods from commit.c
There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

The method declarations in commit.h are not touched by this commit and
will be moved in a following commit. Many consumers need to point to
commit-reach.h and that would bloat this commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20 15:38:54 -07:00
Junio C Hamano 00624d608c Merge branch 'sb/object-store-grafts'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-07-18 12:20:28 -07:00
Jonathan Tan dade47c06c commit-graph: add repo arg to graph readers
Add a struct repository argument to the functions in commit-graph.h that
read the commit graph. (This commit does not affect functions that write
commit graphs.)

Because the commit graph functions can now read the commit graph of any
repository, the global variable core_commit_graph has been removed.
Instead, the config option core.commitGraph is now read on the first
time in a repository that a commit is attempted to be parsed using its
commit graph.

This commit includes a test that exercises the functionality on an
arbitrary repository that is not the_repository.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-17 15:47:48 -07:00
Junio C Hamano 8295296458 Merge branch 'ds/commit-graph-fsck' into jt/commit-graph-per-object-store
* ds/commit-graph-fsck: (23 commits)
  coccinelle: update commit.cocci
  commit-graph: update design document
  gc: automatically write commit-graph files
  commit-graph: add '--reachable' option
  commit-graph: use string-list API for input
  fsck: verify commit-graph
  commit-graph: verify contents match checksum
  commit-graph: test for corrupted octopus edge
  commit-graph: verify commit date
  commit-graph: verify generation number
  commit-graph: verify parent list
  commit-graph: verify root tree OIDs
  commit-graph: verify objects exist
  commit-graph: verify corrupt OID fanout and lookup
  commit-graph: verify required chunks are present
  commit-graph: verify catches corrupt signature
  commit-graph: add 'verify' subcommand
  commit-graph: load a root tree from specific graph
  commit: force commit to parse from object database
  commit-graph: parse commit from chosen graph
  ...
2018-07-17 15:46:19 -07:00
brian m. carlson 2770ccbdb2 commit: express tree entry constants in terms of the_hash_algo
Specify these constants in terms of the size of the hash algorithm
currently in use.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-16 14:27:39 -07:00
Stefan Beller 1f6c72fe55 commit.c: allow lookup_commit_reference to handle arbitrary repositories
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:40 -07:00
Stefan Beller d9a05e74ec commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:40 -07:00
Stefan Beller 4ff7e5c936 commit.c: allow get_cached_commit_buffer to handle arbitrary repositories
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:40 -07:00
Stefan Beller 1a40fc4509 commit.c: allow set_commit_buffer to handle arbitrary repositories
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:40 -07:00
Stefan Beller 65ea9d4bec commit.c: migrate the commit buffer to the parsed object store
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:40 -07:00
Stefan Beller fd8030c739 commit.c: allow parse_commit_buffer to handle arbitrary repositories
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:40 -07:00
Stefan Beller bacf16874e commit: allow lookup_commit to handle arbitrary repositories
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller a74093da5e tag: add repository argument to deref_tag
Add a repository argument to allow the callers of deref_tag
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller 3ce85f7e5a commit: add repository argument to get_cached_commit_buffer
Add a repository argument to allow callers of get_cached_commit_buffer to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller 5e0c63604d commit: add repository argument to set_commit_buffer
Add a repository argument to allow callers of set_commit_buffer to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller 08f4f44501 commit: add repository argument to parse_commit_buffer
Add a repository argument to allow the callers of parse_commit_buffer
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller c1f5eb4962 commit: add repository argument to lookup_commit
Add a repository argument to allow callers of lookup_commit to be more
specific about which repository to handle. This is a small mechanical
change; it doesn't change the implementation to handle repositories
other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller 2122f6754c commit: add repository argument to lookup_commit_reference
Add a repository argument to allow callers of lookup_commit_reference
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller 21e1ee8f4f commit: add repository argument to lookup_commit_reference_gently
Add a repository argument to allow callers of
lookup_commit_reference_gently to be more specific about which
repository to handle. This is a small mechanical change; it doesn't
change the implementation to handle repositories other than
the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller f86bcc7b2c tree: add repository argument to lookup_tree
Add a repository argument to allow the callers of lookup_tree
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Stefan Beller 1268dfac1e object: add repository argument to object_as_type
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Stefan Beller 5abddd1eb7 object: add repository argument to lookup_object
Add a repository argument to allow callers of lookup_object to be more
specific about which repository to handle. This is a small mechanical
change; it doesn't change the implementation to handle repositories
other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Stefan Beller 109cd76dd3 object: add repository argument to parse_object
Add a repository argument to allow the callers of parse_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Junio C Hamano b16b60f71b Merge branch 'sb/object-store-grafts' into sb/object-store-lookup
* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-06-29 10:43:28 -07:00
Derrick Stolee 9b19adac6f commit: force commit to parse from object database
In anticipation of verifying commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:27:05 -07:00
Junio C Hamano 110240588d Merge branch 'sb/object-store-alloc'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-alloc:
  alloc: allow arbitrary repositories for alloc functions
  object: allow create_object to handle arbitrary repositories
  object: allow grow_object_hash to handle arbitrary repositories
  alloc: add repository argument to alloc_commit_index
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_blob_node
  object: add repository argument to grow_object_hash
  object: add repository argument to create_object
  repository: introduce parsed objects field
2018-06-25 13:22:38 -07:00
Junio C Hamano a856e7d69f Merge branch 'ds/commit-graph-lockfile-fix'
Update to ds/generation-numbers topic.

* ds/commit-graph-lockfile-fix:
  commit-graph: fix UX issue when .lock file exists
  commit-graph.txt: update design document
  merge: check config before loading commits
  commit: use generation number in remove_redundant()
  commit: add short-circuit to paint_down_to_common()
  commit: use generation numbers for in_merge_bases()
  ref-filter: use generation number for --contains
  commit-graph: always load commit-graph information
  commit: use generations in paint_down_to_common()
  commit-graph: compute generation numbers
  commit: add generation number to struct commit
  ref-filter: fix outdated comment on in_commit_list
2018-06-25 13:22:36 -07:00
Junio C Hamano b3b2aaf0fd Merge branch 'nd/commit-util-to-slab'
The in-core "commit" object had an all-purpose "void *util" field,
which was tricky to use especially in library-ish part of the
code.  All of the existing uses of the field has been migrated to a
more dedicated "commit-slab" mechanism and the field is eliminated.

* nd/commit-util-to-slab:
  commit.h: delete 'util' field in struct commit
  merge: use commit-slab in merge remote desc instead of commit->util
  log: use commit-slab in prepare_bases() instead of commit->util
  show-branch: note about its object flags usage
  show-branch: use commit-slab for commit-name instead of commit->util
  name-rev: use commit-slab for rev-name instead of commit->util
  bisect.c: use commit-slab for commit weight instead of commit->util
  revision.c: use commit-slab for show_source
  sequencer.c: use commit-slab to associate todo items to commits
  sequencer.c: use commit-slab to mark seen commits
  shallow.c: use commit-slab for commit depth instead of commit->util
  describe: use commit-slab for commit names instead of commit->util
  blame: use commit-slab for blame suspects instead of commit->util
  commit-slab: support shared commit-slab
  commit-slab.h: code split
2018-06-25 13:22:35 -07:00
Junio C Hamano 549ca8aa7c Merge branch 'jk/index-pack-maint'
"index-pack --strict" has been taught to make sure that it runs the
final object integrity checks after making the freshly indexed
packfile available to itself.

* jk/index-pack-maint:
  index-pack: correct install_packed_git() args
  index-pack: handle --strict checks of non-repo packs
  prepare_commit_graft: treat non-repository as a noop
2018-06-13 12:50:46 -07:00
Jeff King 14a9bd2898 prepare_commit_graft: treat non-repository as a noop
The parse_commit_buffer() function consults lookup_commit_graft()
to see if we need to rewrite parents. The latter will look
at $GIT_DIR/info/grafts. If you're outside of a repository,
then this will trigger a BUG() as of b1ef400eec (setup_git_env:
avoid blind fall-back to ".git", 2016-10-20).

It's probably uncommon to actually parse a commit outside of
a repository, but you can see it in action with:

  cd /not/a/git/repo
  git index-pack --strict /some/file.pack

This works fine without --strict, but the fsck checks will
try to parse any commits, triggering the BUG(). We can fix
that by teaching the graft code to behave as if there are no
grafts when we aren't in a repository.

Arguably index-pack (and fsck) are wrong to consider grafts
at all. So another solution is to disable grafts entirely
for those commands. But given that the graft feature is
deprecated anyway, it's not worth even thinking through the
ramifications that might have.

There is one other corner case I considered here. What
should:

  cd /not/a/git/repo
  export GIT_GRAFT_FILE=/file/with/grafts
  git index-pack --strict /some/file.pack

do? We don't have a repository, but the user has pointed us
directly at a graft file, which we could respect. I believe
this case did work that way prior to b1ef400eec. However,
fixing it now would be pretty invasive. Back then we would
just call into setup_git_env() even without a repository.
But these days it actually takes a git_dir argument. So
there would be a fair bit of refactoring of the setup code
involved.

Given the obscurity of this case, plus the fact that grafts
are deprecated and probably shouldn't work under index-pack
anyway, it's not worth pursuing further. This patch at least
un-breaks the common case where you're _not_ using grafts,
but we BUG() anyway trying to even find that out.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-01 11:48:54 +09:00
Junio C Hamano 42c8ce1c49 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (42 commits)
  merge-one-file: compute empty blob object ID
  add--interactive: compute the empty tree value
  Update shell scripts to compute empty tree object ID
  sha1_file: only expose empty object constants through git_hash_algo
  dir: use the_hash_algo for empty blob object ID
  sequencer: use the_hash_algo for empty tree object ID
  cache-tree: use is_empty_tree_oid
  sha1_file: convert cached object code to struct object_id
  builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
  builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
  wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
  submodule: convert several uses of EMPTY_TREE_SHA1_HEX
  sequencer: convert one use of EMPTY_TREE_SHA1_HEX
  merge: convert empty tree constant to the_hash_algo
  builtin/merge: switch tree functions to use object_id
  builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
  sha1-file: add functions for hex empty tree and blob OIDs
  builtin/receive-pack: avoid hard-coded constants for push certs
  diff: specify abbreviation size in terms of the_hash_algo
  upload-pack: replace use of several hard-coded constants
  ...
2018-05-30 14:04:10 +09:00
Junio C Hamano 352cf6cfe1 Merge branch 'js/deprecate-grafts'
The functionality of "$GIT_DIR/info/grafts" has been superseded by
the "refs/replace/" mechanism for some time now, but the internal
code had support for it in many places, which has been cleaned up
in order to drop support of the "grafts" mechanism.

* js/deprecate-grafts:
  Remove obsolete script to convert grafts to replace refs
  technical/shallow: describe why shallow cannot use replace refs
  technical/shallow: stop referring to grafts
  filter-branch: stop suggesting to use grafts
  Deprecate support for .git/info/grafts
  Add a test for `git replace --convert-graft-file`
  replace: introduce --convert-graft-file
  replace: prepare create_graft() for converting graft files wholesale
  replace: "libify" create_graft() and callees
  replace: avoid using die() to indicate a bug
  commit: Let the callback of for_each_mergetag return on error
  argv_array: offer to split a string by whitespace
2018-05-23 14:38:17 +09:00
Junio C Hamano c89b6e136e Merge branch 'ds/lazy-load-trees'
The code has been taught to use the duplicated information stored
in the commit-graph file to learn the tree object name for a commit
to avoid opening and parsing the commit object when it makes sense
to do so.

* ds/lazy-load-trees:
  coccinelle: avoid wrong transformation suggestions from commit.cocci
  commit-graph: lazy-load trees for commits
  treewide: replace maybe_tree with accessor methods
  commit: create get_commit_tree() method
  treewide: rename tree to maybe_tree
2018-05-23 14:38:13 +09:00
Derrick Stolee 04bc8d1ecc commit: use generation number in remove_redundant()
The static remove_redundant() method is used to filter a list
of commits by removing those that are reachable from another
commit in the list. This is used to remove all possible merge-
bases except a maximal, mutually independent set.

To determine these commits are independent, we use a number of
paint_down_to_common() walks and use the PARENT1, PARENT2 flags
to determine reachability. Since we only care about reachability
and not the full set of merge-bases between 'one' and 'twos', we
can use the 'min_generation' parameter to short-circuit the walk.

When no commit-graph exists, there is no change in behavior.

For a copy of the Linux repository, we measured the following
performance improvements:

git merge-base v3.3 v4.5

Before: 234 ms
 After: 208 ms
 Rel %: -11%

git merge-base v4.3 v4.5

Before: 102 ms
 After:  83 ms
 Rel %: -19%

The experiments above were chosen to demonstrate that we are
improving the filtering of the merge-base set. In the first
example, more time is spent walking the history to find the
set of merge bases before the remove_redundant() call. The
starting commits are closer together in the second example,
therefore more time is spent in remove_redundant(). The relative
change in performance differs as expected.

Reported-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22 12:36:34 +09:00
Derrick Stolee d7c1ec3efd commit: add short-circuit to paint_down_to_common()
When running 'git branch --contains', the in_merge_bases_many()
method calls paint_down_to_common() to discover if a specific
commit is reachable from a set of branches. Commits with lower
generation number are not needed to correctly answer the
containment query of in_merge_bases_many().

Add a new parameter, min_generation, to paint_down_to_common() that
prevents walking commits with generation number strictly less than
min_generation. If 0 is given, then there is no functional change.

For in_merge_bases_many(), we can pass commit->generation as the
cutoff, and this saves time during 'git branch --contains' queries
that would otherwise walk "around" the commit we are inspecting.

For a copy of the Linux repository, where HEAD is checked out at
v4.13~100, we get the following performance improvement for
'git branch --contains' over the previous commit:

Before: 0.21s
After:  0.13s
Rel %: -38%

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22 12:36:34 +09:00
Derrick Stolee f9b8908b85 commit: use generation numbers for in_merge_bases()
The containment algorithm for 'git branch --contains' is different
from that for 'git tag --contains' in that it uses is_descendant_of()
instead of contains_tag_algo(). The expensive portion of the branch
algorithm is computing merge bases.

When a commit-graph file exists with generation numbers computed,
we can avoid this merge-base calculation when the target commit has
a larger generation number than the initial commits.

Performance tests were run on a copy of the Linux repository where
HEAD is contained in v4.13 but no earlier tag. Also, all tags were
copied to branches and 'git branch --contains' was tested:

Before: 60.0s
After:   0.4s
Rel %: -99.3%

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22 12:36:34 +09:00
Derrick Stolee e2838d85b6 commit-graph: always load commit-graph information
Most code paths load commits using lookup_commit() and then
parse_commit(). In some cases, including some branch lookups, the commit
is parsed using parse_object_buffer() which side-steps parse_commit() in
favor of parse_commit_buffer().

With generation numbers in the commit-graph, we need to ensure that any
commit that exists in the commit-graph file has its generation number
loaded.

Create new load_commit_graph_info() method to fill in the information
for a commit that exists only in the commit-graph file. Call it from
parse_commit_buffer() after loading the other commit information from
the given buffer. Only fill this information when specified by the
'check_graph' parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22 12:36:34 +09:00
Derrick Stolee 3afc679b3c commit: use generations in paint_down_to_common()
Define compare_commits_by_gen_then_commit_date(), which uses generation
numbers as a primary comparison and commit date to break ties (or as a
comparison when both commits do not have computed generation numbers).

Since the commit-graph file is closed under reachability, we know that
all commits in the file have generation at most GENERATION_NUMBER_MAX
which is less than GENERATION_NUMBER_INFINITY.

This change does not affect the number of commits that are walked during
the execution of paint_down_to_common(), only the order that those
commits are inspected. In the case that commit dates violate topological
order (i.e. a parent is "newer" than a child), the previous code could
walk a commit twice: if a commit is reached with the PARENT1 bit, but
later is re-visited with the PARENT2 bit, then that PARENT2 bit must be
propagated to its parents. Using generation numbers avoids this extra
effort, even if it is somewhat rare.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22 12:36:34 +09:00
Nguyễn Thái Ngọc Duy e2e5ac2303 merge: use commit-slab in merge remote desc instead of commit->util
It's done so that commit->util can be removed. See more explanation in
the commit that removes commit->util.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21 14:07:20 +09:00
Stefan Beller b9dbddf6da commit: allow lookup_commit_graft to handle arbitrary repositories
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller 2f6c767fd4 commit: allow prepare_commit_graft to handle arbitrary repositories
Move the global variable 'commit_graft_prepared' into the object
pool and convert the function prepare_commit_graft to work
an arbitrary repositories.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller 0437a2e365 cache: convert get_graft_file to handle arbitrary repositories
This conversion was done without the #define trick used in the earlier
series refactoring to have better repository access, because this function
is easy to review, as all lines are converted and it has only one caller.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Brandon Williams d0e5dd0ed4 commit: convert read_graft_file to handle arbitrary repositories
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Brandon Williams a3b78e833b commit: convert register_commit_graft to handle arbitrary repositories
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Brandon Williams e808656c46 commit: convert commit_graft_pos() to handle arbitrary repositories
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller c88134870e shallow: add repository argument to is_repository_shallow
Add a repository argument to allow callers of is_repository_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Jonathan Nieder 1f93ecd1ab commit: add repository argument to lookup_commit_graft
Add a repository argument to allow callers of lookup_commit_graft to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Jonathan Nieder 3ee37656ee commit: add repository argument to prepare_commit_graft
Add a repository argument to allow the caller of prepare_commit_graft
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Jonathan Nieder 02ba3e1a05 commit: add repository argument to read_graft_file
Add a repository argument to allow the caller of read_graft_file to be
more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Jonathan Nieder 3f5787f806 commit: add repository argument to register_commit_graft
Add a repository argument to allow callers of register_commit_graft to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Jonathan Nieder be479e801d commit: add repository argument to commit_graft_pos
Add a repository argument to allow callers of commit_graft_pos to be
more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Jonathan Nieder 6a1a79fd14 object: move grafts to object parser
Grafts are only meaningful in the context of a single repository.
Therefore they cannot be global.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Stefan Beller cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Stefan Beller 14ba97f81c alloc: allow arbitrary repositories for alloc functions
We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.

We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:16:50 +09:00
Stefan Beller 8ba0e5ec57 alloc: add repository argument to alloc_commit_node
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Stefan Beller 68f95d382b object: add repository argument to create_object
Add a repository argument to allow the callers of create_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 12:12:36 +09:00
Junio C Hamano b10edb2df5 Merge branch 'ds/commit-graph'
Precompute and store information necessary for ancestry traversal
in a separate file to optimize graph walking.

* ds/commit-graph:
  commit-graph: implement "--append" option
  commit-graph: build graph from starting commits
  commit-graph: read only from specific pack-indexes
  commit: integrate commit graph with commit parsing
  commit-graph: close under reachability
  commit-graph: add core.commitGraph setting
  commit-graph: implement git commit-graph read
  commit-graph: implement git-commit-graph write
  commit-graph: implement write_commit_graph()
  commit-graph: create git-commit-graph builtin
  graph: add commit graph design document
  commit-graph: add format document
  csum-file: refactor finalize_hashfile() method
  csum-file: rename hashclose() to finalize_hashfile()
2018-05-08 15:59:20 +09:00
brian m. carlson 26ea3e7dca commit: convert uses of get_sha1_hex to get_oid_hex
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:51 +09:00
Johannes Schindelin f9f99b3f7d Deprecate support for .git/info/grafts
The grafts feature was a convenient way to "stitch together" ancient
history to the fresh start of linux.git.

Its implementation is, however, not up to Git's standards, as there are
too many ways where it can lead to surprising and unwelcome behavior.

For example, when pushing from a repository with active grafts, it is
possible to miss commits that have been "grafted out", resulting in a
broken state on the other side.

Also, the grafts feature is limited to "rewriting" commits' list of
parents, it cannot replace anything else.

The much younger feature implemented as `git replace` set out to remedy
those limitations and dangerous bugs.

Seeing as `git replace` is pretty mature by now (since 4228e8bc98
(replace: add --graft option, 2014-07-19) it can perform the graft
file's duties), it is time to deprecate support for the graft file, and
to retire it eventually.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-30 11:12:31 +09:00
Johannes Schindelin fef461ea5d commit: Let the callback of for_each_mergetag return on error
This is yet another patch to be filed under the keyword "libification".

There is one subtle change in behavior here, where a `git log` that has
been asked to show the mergetags would now stop reporting the mergetags
upon the first failure, whereas previously, it would have continued to the
next mergetag, if any.

In practice, that change should not matter, as it is 1) uncommon to
perform octopus merges using multiple tags as merge heads, and 2) when the
user asks to be shown those tags, they really should be there.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26 12:52:57 +09:00
Derrick Stolee 7b8a21dba1 commit-graph: lazy-load trees for commits
The commit-graph file provides quick access to commit data, including
the OID of the root tree for each commit in the graph. When performing
a deep commit-graph walk, we may not need to load most of the trees
for these commits.

Delay loading the tree object for a commit loaded from the graph
until requested via get_commit_tree(). Do not lazy-load trees for
commits not in the graph, since that requires duplicate parsing
and the relative peformance improvement when trees are not needed
is small.

On the Linux repository, performance tests were run for the following
command:

    git log --graph --oneline -1000

    Before: 0.92s
    After:  0.66s
    Rel %: -28.3%

Adding '-- kernel/' to the command requires loading the root tree
for every commit that is walked. There was no measureable performance
change as a result of this patch.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:47:16 +09:00
Derrick Stolee 5bb03de102 commit: create get_commit_tree() method
While walking the commit graph, we load struct commit objects into
the object cache. During this process, we also load struct tree
objects for the root tree of each of these commits. We load these
objects even if we are only computing commit reachability information,
such as a merge base or ahead/behind information.

Create get_commit_tree() as a first step to removing direct
references to the 'maybe_tree' member of struct commit.

Create get_commit_tree_oid() as a shortcut for several references
to "&commit->maybe_tree->object.oid" in the codebase.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:47:16 +09:00
Derrick Stolee 891435d55d treewide: rename tree to maybe_tree
Using the commit-graph file to walk commit history removes the large
cost of parsing commits during the walk. This exposes a performance
issue: lookup_tree() takes a large portion of the computation time,
even when Git never uses those trees.

In anticipation of lazy-loading these trees, rename the 'tree' member
of struct commit to 'maybe_tree'. This serves two purposes: it hints
at the future role of possibly being NULL even if the commit has a
valid tree, and it allows for unambiguous transformation from simple
member access (i.e. commit->maybe_tree) to method access.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:47:16 +09:00
Junio C Hamano 2d5792f071 Merge branch 'bw/c-plus-plus' into ds/lazy-load-trees
* bw/c-plus-plus: (37 commits)
  replace: rename 'new' variables
  trailer: rename 'template' variables
  tempfile: rename 'template' variables
  wrapper: rename 'template' variables
  environment: rename 'namespace' variables
  diff: rename 'template' variables
  environment: rename 'template' variables
  init-db: rename 'template' variables
  unpack-trees: rename 'new' variables
  trailer: rename 'new' variables
  submodule: rename 'new' variables
  split-index: rename 'new' variables
  remote: rename 'new' variables
  ref-filter: rename 'new' variables
  read-cache: rename 'new' variables
  line-log: rename 'new' variables
  imap-send: rename 'new' variables
  http: rename 'new' variables
  entry: rename 'new' variables
  diffcore-delta: rename 'new' variables
  ...
2018-04-11 10:46:32 +09:00
Derrick Stolee 177722b344 commit: integrate commit graph with commit parsing
Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv                       |  1.02s |  0.14s | -86%  |
| rev-list --all                   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects         | 66.15s | 58.45s | -11%  |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:43:02 +09:00
brian m. carlson b4f5aca40e sha1_file: convert read_sha1_file to struct object_id
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file.  Do the same for read_sha1_file_extended.

Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id.  Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
brian m. carlson e816caa07b sha1_file: convert assert_sha1_type to object_id
Convert this function to take a pointer to struct object_id and rename
it to assert_oid_type.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
Junio C Hamano 169c9c0169 Merge branch 'bw/c-plus-plus'
Avoid using identifiers that clash with C++ keywords.  Even though
it is not a goal to compile Git with C++ compilers, changes like
this help use of code analysis tools that targets C++ on our
codebase.

* bw/c-plus-plus: (37 commits)
  replace: rename 'new' variables
  trailer: rename 'template' variables
  tempfile: rename 'template' variables
  wrapper: rename 'template' variables
  environment: rename 'namespace' variables
  diff: rename 'template' variables
  environment: rename 'template' variables
  init-db: rename 'template' variables
  unpack-trees: rename 'new' variables
  trailer: rename 'new' variables
  submodule: rename 'new' variables
  split-index: rename 'new' variables
  remote: rename 'new' variables
  ref-filter: rename 'new' variables
  read-cache: rename 'new' variables
  line-log: rename 'new' variables
  imap-send: rename 'new' variables
  http: rename 'new' variables
  entry: rename 'new' variables
  diffcore-delta: rename 'new' variables
  ...
2018-03-06 14:54:07 -08:00
Brandon Williams 258c03eb45 commit: rename 'new' variables
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-22 10:08:05 -08:00
Junio C Hamano 8be8342b4c Merge branch 'po/object-id'
Conversion from uchar[20] to struct object_id continues.

* po/object-id:
  sha1_file: rename hash_sha1_file_literally
  sha1_file: convert write_loose_object to object_id
  sha1_file: convert force_object_loose to object_id
  sha1_file: convert write_sha1_file to object_id
  notes: convert write_notes_tree to object_id
  notes: convert combine_notes_* to object_id
  commit: convert commit_tree* to object_id
  match-trees: convert splice_tree to object_id
  cache: clear whole hash buffer with oidclr
  sha1_file: convert hash_sha1_file to object_id
  dir: convert struct sha1_stat to use object_id
  sha1_file: convert pretend_sha1_file to object_id
2018-02-15 14:55:43 -08:00
Brandon Williams debca9d2fe object: rename function 'typename' to 'type_name'
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-14 13:10:05 -08:00
Junio C Hamano cbf0240f82 Merge branch 'sg/cocci-move-array'
Code clean-up.

* sg/cocci-move-array:
  Use MOVE_ARRAY
2018-02-13 13:39:13 -08:00
Patryk Obara a09c985eae sha1_file: convert write_sha1_file to object_id
Convert the definition and declaration of write_sha1_file to
struct object_id and adjust usage of this function.

This commit also converts static function write_sha1_file_prepare, as it
is closely related.

Rename these functions to write_object_file and
write_object_file_prepare respectively.

Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents
wherever possible.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Patryk Obara 5078f34459 commit: convert commit_tree* to object_id
Convert the definitions and declarations of commit_tree and
commit_tree_extended to use struct object_id and adjust all usages of
these functions.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
SZEDER Gábor f919ffebed Use MOVE_ARRAY
Use the helper macro MOVE_ARRAY to move arrays.  This is shorter and
safer, as it automatically infers the size of elements.

Patch generated by Coccinelle and contrib/coccinelle/array.cocci in
Travis CI's static analysis build job.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-22 11:32:51 -08:00
René Scharfe 6fcec2f9ae commit: remove unused function clear_commit_marks_for_object_array()
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-28 13:50:05 -08:00
René Scharfe abc035126a commit: use clear_commit_marks_many() in remove_redundant()
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-28 13:50:05 -08:00
René Scharfe 07f7d55a34 commit: avoid allocation in clear_commit_marks_many()
Pass the entries of the commit array directly to clear_commit_marks_1()
instead of adding them to a commit_list first.  The function clears the
commit and any first parent without allocation; only higher numbered
parents are added to a list for later treatment.  This change extends
that optimization to clear_commit_marks_many().

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-28 13:50:05 -08:00
Martin Ågren 4da72644b7 reduce_heads: fix memory leaks
We currently have seven callers of `reduce_heads(foo)`. Six of them do
not use the original list `foo` again, and actually, all six of those
end up leaking it.

Introduce and use `reduce_heads_replace(&foo)` as a leak-free version of
`foo = reduce_heads(foo)` to fix several of these. Fix the remaining
leaks using `free_commit_list()`.

While we're here, document `reduce_heads()` and mark it as `extern`.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-08 11:34:00 +09:00
Junio C Hamano 69c54c7284 Merge branch 'ma/leakplugs'
Memory leaks in various codepaths have been plugged.

* ma/leakplugs:
  pack-bitmap[-write]: use `object_array_clear()`, don't leak
  object_array: add and use `object_array_pop()`
  object_array: use `object_array_clear()`, not `free()`
  leak_pending: use `object_array_clear()`, not `free()`
  commit: fix memory leak in `reduce_heads()`
  builtin/commit: fix memory leak in `prepare_index()`
2017-09-29 11:23:43 +09:00
Martin Ågren cb7b29eb67 commit: fix memory leak in reduce_heads()
We don't free the temporary scratch space we use with
`remove_redundant()`. Free it similar to how we do it in
`get_merge_bases_many_0()`.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-24 10:05:51 +09:00
Junio C Hamano d811ba1897 Merge branch 'rs/strbuf-leakfix'
Many leaks of strbuf have been fixed.

* rs/strbuf-leakfix: (34 commits)
  wt-status: release strbuf after use in wt_longstatus_print_tracking()
  wt-status: release strbuf after use in read_rebase_todolist()
  vcs-svn: release strbuf after use in end_revision()
  utf8: release strbuf on error return in strbuf_utf8_replace()
  userdiff: release strbuf after use in userdiff_get_textconv()
  transport-helper: release strbuf after use in process_connect_service()
  sequencer: release strbuf after use in save_head()
  shortlog: release strbuf after use in insert_one_record()
  sha1_file: release strbuf on error return in index_path()
  send-pack: release strbuf on error return in send_pack()
  remote: release strbuf after use in set_url()
  remote: release strbuf after use in migrate_file()
  remote: release strbuf after use in read_remote_branches()
  refs: release strbuf on error return in write_pseudoref()
  notes: release strbuf after use in notes_copy_from_stdin()
  merge: release strbuf after use in write_merge_heads()
  merge: release strbuf after use in save_state()
  mailinfo: release strbuf on error return in handle_boundary()
  mailinfo: release strbuf after use in handle_from()
  help: release strbuf on error return in exec_woman_emacs()
  ...
2017-09-19 10:47:57 +09:00
Rene Scharfe e505146dac commit: release strbuf on error return in commit_tree_extended()
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07 08:49:27 +09:00
Patryk Obara cfa5bf1608 commit: rewrite read_graft_line
Old implementation determined number of hashes by dividing length of
line by length of hash, which works only if all hash representations
have same length.

New graft line parser works in two phases:

  1. In first phase line is scanned to verify correctness and compute
     number of hashes, then graft struct is allocated.

  2. In second phase line is scanned again to fill up already allocated
     graft struct.

This way graft parsing code can support different sizes of hashes
without any further code adaptations.

A number of alternative implementations were considered and discarded:

  - Modifying graft structure to store oid_array instead of FLEXI_ARRAY
    indicates undesirable usage of struct to readers.

  - Parsing into temporary string_list or oid_array complicates code
    by adding more return paths, as these structures needs to be
    cleared before returning from function.

  - Determining number of hashes by counting separators might cause
    maintenance issues, if this function needs to be modified in future
    again.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-18 12:41:06 -07:00