development/git - HydraGit

development/git

Fork 0

mirror of https://github.com/git/git synced 2024-08-24 02:11:08 +00:00

Commit graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Elijah Newren	9ae39fef7f	merge-ort: avoid assuming all renames detected In commit `8b09a900a1` ("merge-ort: restart merge with cached renames to reduce process entry cost", 2021-07-16), we noted that in the merge-ort steps of collect_merge_info() detect_and_process_renames() process_entries() that process_entries() was expensive, and we could often make it cheaper by changing this to collect_merge_info() detect_and_process_renames() <cache all the renames, and restart> collect_merge_info() detect_and_process_renames() process_entries() because the second collect_merge_info() would be cheaper (we could avoid traversing into some directories), the second detect_and_process_renames() would be free since we had already detected all renames, and then process_entries() has far fewer entries to handle. However, this was built on the assumption that the first detect_and_process_renames() actually detected all potential renames. If someone has merge.renameLimit set to some small value, that assumption is violated which manifests later with the following message: $ git -c merge.renameLimit=1 rebase upstream ... git: merge-ort.c:546: clear_or_reinit_internal_opts: Assertion `renames->cached_pairs_valid_side == 0' failed. Turn off this cache-renames-and-restart whenever we cannot detect all renames, and add a testcase that would have caught this problem. Reported-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Elijah Newren <newren@gmail.com> Tested-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-17 14:24:22 -08:00
Elijah Newren	25e65b6dd5	merge-ort, diffcore-rename: employ cached renames when possible When there are many renames between the old base of a series of commits and the new base, the way sequencer.c, merge-recursive.c, and diffcore-rename.c have traditionally split the work resulted in redetecting the same renames with each and every commit being transplanted. To address this, the last several commits have been creating a cache of rename detection results, determining when it was safe to use such a cache in subsequent merge operations, adding helper functions, and so on. See the previous half dozen commit messages for additional discussion of this optimization, particularly the message a few commits ago entitled "add code to check for whether cached renames can be reused". This commit finally ties all of that work together, modifying the merge algorithm to make use of these cached renames. For the testcases mentioned in commit `557ac0350d` ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), this change improves the performance as follows: Before After no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms That's a fairly small improvement, but mostly because the previous optimizations were so effective for these particular testcases; this optimization only kicks in when the others don't. If we undid the basename-guided rename detection and skip-irrelevant-renames optimizations, then we'd see that this series by itself improved performance as follows: Before Basename Series After Just This Series no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s Since this optimization kicks in to help accelerate cases where the previous optimizations do not apply, this last comparison shows that this cached-renames optimization has the potential to help signficantly in cases that don't meet the requirements for the other optimizations to be effective. The changes made in this optimization also lay some important groundwork for a future optimization around having collect_merge_info() avoid recursing into subtrees in more cases. However, for this optimization to be effective, merge_switch_to_result() should only be called when the rebase or cherry-pick operation has either completed or hit a case where the user needs to resolve a conflict or edit the result. If it is called after every commit, as sequencer.c does, then the working tree and index are needlessly updated with every commit and the cached metadata is tossed, defeating this optimization. Refactoring sequencer.c to only call merge_switch_to_result() at the end of the operation is a bigger undertaking, and the practical benefits of this optimization will not be realized until that work is performed. Since `test-tool fast-rebase` only updates at the end of the operation, it was used to obtain the timings above. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-20 15:40:39 +09:00
Elijah Newren	a22099f552	t6429: testcases for remembering renames We will soon be adding an optimization that caches (in memory only, never written to disk) upstream renames during a sequence of merges such as occurs during a cherry-pick or rebase operation. Add several tests meant to stress such an implementation to ensure it does the right thing, and include a test whose outcome we will later change due to this optimization as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-20 15:40:39 +09:00

Elijah Newren

9ae39fef7f

merge-ort: avoid assuming all renames detected

In commit 8b09a900a1 ("merge-ort: restart merge with cached renames to
reduce process entry cost", 2021-07-16), we noted that in the merge-ort
steps of
    collect_merge_info()
    detect_and_process_renames()
    process_entries()
that process_entries() was expensive, and we could often make it cheaper
by changing this to
    collect_merge_info()
    detect_and_process_renames()
    <cache all the renames, and restart>
    collect_merge_info()
    detect_and_process_renames()
    process_entries()
because the second collect_merge_info() would be cheaper (we could avoid
traversing into some directories), the second
detect_and_process_renames() would be free since we had already detected
all renames, and then process_entries() has far fewer entries to handle.

However, this was built on the assumption that the first
detect_and_process_renames() actually detected all potential renames.
If someone has merge.renameLimit set to some small value, that
assumption is violated which manifests later with the following message:

    $ git -c merge.renameLimit=1 rebase upstream
    ...
    git: merge-ort.c:546: clear_or_reinit_internal_opts: Assertion
    `renames->cached_pairs_valid_side == 0' failed.

Turn off this cache-renames-and-restart whenever we cannot detect all
renames, and add a testcase that would have caught this problem.

Reported-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Tested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

2022-01-17 14:24:22 -08:00

Elijah Newren

25e65b6dd5

merge-ort, diffcore-rename: employ cached renames when possible

When there are many renames between the old base of a series of commits
and the new base, the way sequencer.c, merge-recursive.c, and
diffcore-rename.c have traditionally split the work resulted in
redetecting the same renames with each and every commit being
transplanted.  To address this, the last several commits have been
creating a cache of rename detection results, determining when it was
safe to use such a cache in subsequent merge operations, adding helper
functions, and so on.  See the previous half dozen commit messages for
additional discussion of this optimization, particularly the message a
few commits ago entitled "add code to check for whether cached renames
can be reused".  This commit finally ties all of that work together,
modifying the merge algorithm to make use of these cached renames.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.665 s ±  0.129 s     5.622 s ±  0.059 s
    mega-renames:     11.435 s ±  0.158 s    10.127 s ±  0.073 s
    just-one-mega:   494.2  ms ±  6.1  ms   500.3  ms ±  3.8  ms

That's a fairly small improvement, but mostly because the previous
optimizations were so effective for these particular testcases; this
optimization only kicks in when the others don't.  If we undid the
basename-guided rename detection and skip-irrelevant-renames
optimizations, then we'd see that this series by itself improved
performance as follows:

                   Before Basename Series   After Just This Series
    no-renames:      13.815 s ±  0.062 s      5.697 s ±  0.080 s
    mega-renames:  1799.937 s ±  0.493 s    205.709 s ±  0.457 s

Since this optimization kicks in to help accelerate cases where the
previous optimizations do not apply, this last comparison shows that
this cached-renames optimization has the potential to help signficantly
in cases that don't meet the requirements for the other optimizations to
be effective.

The changes made in this optimization also lay some important groundwork
for a future optimization around having collect_merge_info() avoid
recursing into subtrees in more cases.

However, for this optimization to be effective, merge_switch_to_result()
should only be called when the rebase or cherry-pick operation has
either completed or hit a case where the user needs to resolve a
conflict or edit the result.  If it is called after every commit, as
sequencer.c does, then the working tree and index are needlessly updated
with every commit and the cached metadata is tossed, defeating this
optimization.  Refactoring sequencer.c to only call
merge_switch_to_result() at the end of the operation is a bigger
undertaking, and the practical benefits of this optimization will not be
realized until that work is performed.  Since `test-tool fast-rebase`
only updates at the end of the operation, it was used to obtain the
timings above.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

2021-05-20 15:40:39 +09:00

Elijah Newren

a22099f552

t6429: testcases for remembering renames

We will soon be adding an optimization that caches (in memory only,
never written to disk) upstream renames during a sequence of merges such
as occurs during a cherry-pick or rebase operation.  Add several tests
meant to stress such an implementation to ensure it does the right
thing, and include a test whose outcome we will later change due to this
optimization as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

2021-05-20 15:40:39 +09:00

3 commits