git/contrib
Zach FettersMoore 98ba49ccc2 subtree: fix split processing with multiple subtrees present
When there are multiple subtrees present in a repository and they are
all using 'git subtree split', the 'split' command can take a
significant (and constantly growing) amount of time to run even when
using the '--rejoin' flag. This is due to the fact that when processing
commits to determine the last known split to start from when looking
for changes, if there has been a split/merge done from another subtree
there will be 2 split commits, one mainline and one subtree, for the
second subtree that are part of the processing. The non-mainline
subtree split commit will cause the processing to always need to search
the entire history of the given subtree as part of its processing even
though those commits are totally irrelevant to the current subtree
split being run.

To see this in practice you can use the open source GitHub repo
'apollo-ios-dev' and do the following in order:

-Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen'
 directories
-Create a commit containing these changes
-Do a split on apollo-ios-codegen
   - Do a fetch on the subtree repo
      - git fetch git@github.com:apollographql/apollo-ios-codegen.git
   - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
   - Depending on the current state of the 'apollo-ios-dev' repo
     you may see the issue at this point if the last split was on
     apollo-ios
-Do a split on apollo-ios
   - Do a fetch on the subtree repo
      - git fetch git@github.com:apollographql/apollo-ios.git
   - git subtree split --prefix=apollo-ios --squash --rejoin
-Make changes to a file in apollo-ios-codegen
-Create a commit containing the change(s)
-Do a split on apollo-ios-codegen
   - git subtree split --prefix=apollo-ios-codegen --squash --rejoin
-To see that the patch fixes the issue you can use the custom subtree
 script in the repo so following the same steps as above, except
 instead of using 'git subtree ...' for the commands use
 'git-subtree.sh ...' for the commands

You will see that the final split is looking for the last split
on apollo-ios-codegen to use as it's starting point to process
commits. Since there is a split commit from apollo-ios in between the
2 splits run on apollo-ios-codegen, the processing ends up traversing
the entire history of apollo-ios which increases the time it takes to
do a split based on how long of a history apollo-ios has, while none
of these commits are relevant to the split being done on
apollo-ios-codegen.

So this commit makes a change to the processing of commits for the
split command in order to ignore non-mainline commits from other
subtrees such as apollo-ios in the above breakdown by adding a new
function 'should_ignore_subtree_commit' which is called during
'process_split_commit'. This allows the split/rejoin processing to
still function as expected but removes all of the unnecessary
processing that takes place currently which greatly inflates the
processing time. In the above example, previously the final split
would take ~10-12 minutes, while after this fix it takes seconds.

Added a test to validate that the proposed fix
solves the issue.

The test accomplishes this by checking the output
of the split command to ensure the output from
the progress of 'process_split_commit' function
that represents the 'extracount' of commits
processed remains at 0, meaning none of the commits
from the second subtree were processed.

This was tested against the original functionality
to show the test failed, and then with this fix
to show the test passes.

This illustrated that when using multiple subtrees,
A and B, when doing a split on subtree B, the
processing does not traverse the entire history
of subtree A which is unnecessary and would cause
the 'extracount' of processed commits to climb
based on the number of commits in the history of
subtree A.

Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com>
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-25 10:56:34 -08:00
..
buildsystems Merge branch 'js/doc-unit-tests-with-cmake' 2023-12-09 16:37:47 -08:00
coccinelle config: pass kvi to die_bad_number() 2023-06-28 14:06:40 -07:00
completion git-prompt: stop manually parsing HEAD with unknown ref formats 2024-01-08 11:21:45 -08:00
contacts
credential Merge branch 'mh/credential-erase-improvements-more' 2023-08-28 09:51:16 -07:00
diff-highlight perl: bump the required Perl version to 5.8.1 from 5.8.0 2023-11-17 07:26:32 +09:00
emacs
examples
fast-import import-tars: ignore the global PAX header 2020-03-24 14:39:47 -07:00
git-jump git-jump: admit to passing merge mode args to ls-files 2023-10-05 12:55:38 -07:00
git-shell-commands
hg-to-git hg-to-git: make it compatible with both python3 and python2 2019-09-18 12:03:05 -07:00
hooks multimail: stop shipping a copy 2021-06-11 13:35:19 +09:00
long-running-filter
mw-to-git Merge branch 'tz/send-email-negatable-options' 2023-12-09 16:37:51 -08:00
persistent-https
remote-helpers
stats
subtree subtree: fix split processing with multiple subtrees present 2024-01-25 10:56:34 -08:00
thunderbird-patch-inline
update-unicode
vscode vscode: improve tab size and wrapping 2022-06-27 15:37:44 -07:00
workdir
coverage-diff.sh
git-resurrect.sh contrib/git-resurrect.sh: use hash-agnostic OID pattern 2020-10-08 11:48:56 -07:00
README doc: fix some typos, grammar and wording issues 2023-10-05 12:55:38 -07:00
remotes2config.sh
rerere-train.sh contrib/rerere-train: avoid useless gpg sign in training 2022-07-19 11:24:08 -07:00

Contributed Software

Although these pieces are available as part of the official git
source tree, they are in somewhat different status.  The
intention is to keep interesting tools around git here, maybe
even experimental ones, to give users an easier access to them,
and to give tools wider exposure, so that they can be improved
faster.

I am not expecting to touch these myself that much.  As far as
my day-to-day operation is concerned, these subdirectories are
owned by their respective primary authors.  I am willing to help
if users of these components and the contrib/ subtree "owners"
have technical/design issues to resolve, but the initiative to
fix and/or enhance things _must_ be on the side of the subtree
owners.  IOW, I won't be actively looking for bugs and rooms for
enhancements in them as the git maintainer -- I may only do so
just as one of the users when I want to scratch my own itch.  If
you have patches to things in contrib/ area, the patch should be
first sent to the primary author, and then the primary author
should ack and forward it to me (git pull request is nicer).
This is the same way as how I have been treating gitk, and to a
lesser degree various foreign SCM interfaces, so you know the
drill.

I expect things that start their life in the contrib/ area
to graduate out of contrib/ once they mature, either by becoming
projects on their own, or moving to the toplevel directory.  On
the other hand, I expect I'll be proposing removal of disused
and inactive ones from time to time.

If you have new things to add to this area, please first propose
it on the git mailing list, and after a list discussion proves
there is general interest (it does not have to be a
list-wide consensus for a tool targeted to a relatively narrow
audience -- for example I do not work with projects whose
upstream is svn, so I have no use for git-svn myself, but it is
of general interest for people who need to interoperate with SVN
repositories in a way git-svn works better than git-svnimport),
submit a patch to create a subdirectory of contrib/ and put your
stuff there.

-jc