git/t/t6120-describe.sh

675 lines
18 KiB
Bash
Raw Normal View History

#!/bin/sh
test_description='test describe'
# o---o-----o----o----o-------o----x
# \ D,R e /
# \---o-------------o-'
# \ B /
# `-o----o----o-'
# A c
#
# First parent of a merge commit is on the same line, second parent below.
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
tests: mark tests relying on the current default for `init.defaultBranch` In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-18 23:44:19 +00:00
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
. ./test-lib.sh
check_describe () {
indir= &&
while test $# != 0
do
case "$1" in
-C)
indir="$2"
shift
;;
*)
break
;;
esac
shift
done &&
indir=${indir:+"$indir"/} &&
expect="$1"
shift
describe_opts="$@"
test_expect_success "describe $describe_opts" '
git ${indir:+ -C "$indir"} describe $describe_opts >raw &&
sed -e "s/-g[0-9a-f]*\$/-gHASH/" <raw >actual &&
echo "$expect" >expect &&
test_cmp expect actual
'
}
test_expect_success setup '
test_commit initial file one &&
test_commit second file two &&
test_commit third file three &&
test_commit --annotate A file A &&
test_commit c file c &&
git reset --hard second &&
test_commit --annotate B side B &&
test_tick &&
git merge -m Merged c &&
merged=$(git rev-parse HEAD) &&
git reset --hard second &&
test_commit --no-tag D another D &&
test_tick &&
git tag -a -m R R &&
test_commit e another DD &&
test_commit --no-tag "yet another" another DDD &&
test_tick &&
git merge -m Merged $merged &&
test_commit --no-tag x file
'
check_describe A-8-gHASH HEAD
check_describe A-7-gHASH HEAD^
check_describe R-2-gHASH HEAD^^
check_describe A-3-gHASH HEAD^^2
check_describe B HEAD^^2^
check_describe R-1-gHASH HEAD^^^
check_describe c-7-gHASH --tags HEAD
check_describe c-6-gHASH --tags HEAD^
check_describe e-1-gHASH --tags HEAD^^
check_describe c-2-gHASH --tags HEAD^^2
check_describe B --tags HEAD^^2^
check_describe e --tags HEAD^^^
check_describe e --tags --exact-match HEAD^^^
check_describe heads/main --all HEAD
check_describe tags/c-6-gHASH --all HEAD^
check_describe tags/e --all HEAD^^^
check_describe B-0-gHASH --long HEAD^^2^
check_describe A-3-gHASH --long HEAD^^2
check_describe c-7-gHASH --tags
check_describe e-3-gHASH --first-parent --tags
check_describe c-7-gHASH --tags --no-exact-match HEAD
check_describe e-3-gHASH --first-parent --tags --no-exact-match HEAD
test_expect_success '--exact-match failure' '
test_must_fail git describe --exact-match HEAD 2>err
'
test_expect_success 'describe --contains defaults to HEAD without commit-ish' '
echo "A^0" >expect &&
git checkout A &&
test_when_finished "git checkout -" &&
git describe --contains >actual &&
test_cmp expect actual
'
check_describe tags/A --all A^0
test_expect_success 'renaming tag A to Q locally produces a warning' "
git update-ref refs/tags/Q $(git rev-parse refs/tags/A) &&
git update-ref -d refs/tags/A &&
git describe HEAD 2>err >out &&
cat >expected <<-\EOF &&
warning: tag 'Q' is externally known as 'A'
EOF
test_cmp expected err &&
grep -E '^A-8-g[0-9a-f]+$' out
"
describe: force long format for a name based on a mislocated tag An annotated tag has two names: where it sits in the refs/tags hierarchy and the tagname recorded in the "tag" field in the object itself. They usually should match. Since 212945d4 ("Teach git-describe to verify annotated tag names before output", 2008-02-28), a commit described using an annotated tag bases its name on the tagname from the object. While this was a deliberate design decision to make it easier to converse about tags with others, even if the tags happen to be fetched to a different name than it was given by its creator, it had one downside. The output from "git describe", at least in the modern Git, should be usable as an object name to name the exact commit given to the "git describe" command. Using the tagname, when two names differ, breaks this property, when describing a commit that is directly pointed at by such a tag. An annotated tag Bob made as "v1.0" may sit at "refs/tags/v1.0-bob" in the ref hierarchy, and output from "git describe v1.0-bob^0" would say "v1.0", but there may not be any tag at "refs/tags/v1.0" locally or there may be another tag that points at a different object. Note that this won't be a problem if a commit being described is not directly pointed at by such a mislocated tag. In the example in the previous paragraph, describing a commit whose parent is v1.0-bob would result in "v1.0" (i.e. the tagname taken from the tag object) followed by "-1-gXXXXX" where XXXXX is the abbreviated object name, and a string that ends with "-g" followed by a hexadecimal string is an object name for the object whose name begins with hexadecimal string (as long as it is unique), so it does not matter if the leading part is "v1.0" or "v1.0-bob". Show the name in the long format, i.e. with "-0-gXXXXX" suffix, when the name we give is based on a mislocated annotated tag to ensure that the output can be used as the object name for the object originally given to the command to fix the issue. While at it, remove an overly cautious dead code to protect against an annotated tag object without the tagname. Such a tag is filtered out much earlier in the codeflow, and will not reach this part of the code. Helped-by: Matheus Tavares <matheus.bernardino@usp.br> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-20 17:34:36 +00:00
test_expect_success 'misnamed annotated tag forces long output' '
description=$(git describe --no-long Q^0) &&
expr "$description" : "A-0-g[0-9a-f]*$" &&
git rev-parse --verify "$description" >actual &&
git rev-parse --verify Q^0 >expect &&
test_cmp expect actual
'
test_expect_success 'abbrev=0 will not break misplaced tag (1)' '
description=$(git describe --abbrev=0 Q^0) &&
expr "$description" : "A-0-g[0-9a-f]*$"
'
test_expect_success 'abbrev=0 will not break misplaced tag (2)' '
description=$(git describe --abbrev=0 c^0) &&
expr "$description" : "A-1-g[0-9a-f]*$"
'
test_expect_success 'rename tag Q back to A' '
git update-ref refs/tags/A $(git rev-parse refs/tags/Q) &&
git update-ref -d refs/tags/Q
'
test_expect_success 'pack tag refs' 'git pack-refs'
check_describe A-8-gHASH HEAD
test_expect_success 'describe works from outside repo using --git-dir' '
git clone --bare "$TRASH_DIRECTORY" "$TRASH_DIRECTORY/bare" &&
git --git-dir "$TRASH_DIRECTORY/bare" describe >out &&
grep -E "^A-8-g[0-9a-f]+$" out
'
check_describe "A-8-gHASH" --dirty
Teach "git describe" --dirty option With the --dirty option, git describe works on HEAD but append s"-dirty" iff the contents of the work tree differs from HEAD. E.g. $ git describe --dirty v1.6.5-15-gc274db7 $ echo >> Makefile $ git describe --dirty v1.6.5-15-gc274db7-dirty The --dirty option can also be used to specify what is appended, instead of the default string "-dirty". $ git describe --dirty=.mod v1.6.5-15-gc274db7.mod Many build scripts use `git describe` to produce a version number based on the description of HEAD (on which the work tree is based) + saying that if the build contains uncommitted changes. This patch helps the writing of such scripts since `git describe --dirty` does directly the intended thing. Three possiblities were considered while discussing this new feature: 1. Describe the work tree by default and describe HEAD only if "HEAD" is explicitly specified Pro: does the right thing by default (both for users and for scripts) Pro: other git commands that works on the work tree by default Con: breaks existing scripts used by the Linux kernel and other projects 2. Use --worktree instead of --dirty Pro: does what it says: "git describe --worktree" describes the work tree Con: other commands do not require a --worktree option when working on the work tree (it often is the default mode for them) Con: unusable with an optional value: "git describe --worktree=.mod" is quite unintuitive. 3. Use --dirty as in this patch Pro: makes sense to specify an optional value (what the dirty mark is) Pro: does not have any of the big cons of previous alternatives * does not break scripts * is not inconsistent with other git commands This patch takes the third approach. Signed-off-by: Jean Privat <jean@pryen.org> Acked-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-21 13:35:22 +00:00
test_expect_success 'describe --dirty with --work-tree' '
(
cd "$TEST_DIRECTORY" &&
git --git-dir "$TRASH_DIRECTORY/.git" --work-tree "$TRASH_DIRECTORY" describe --dirty >"$TRASH_DIRECTORY/out"
) &&
grep -E "^A-8-g[0-9a-f]+$" out
'
Teach "git describe" --dirty option With the --dirty option, git describe works on HEAD but append s"-dirty" iff the contents of the work tree differs from HEAD. E.g. $ git describe --dirty v1.6.5-15-gc274db7 $ echo >> Makefile $ git describe --dirty v1.6.5-15-gc274db7-dirty The --dirty option can also be used to specify what is appended, instead of the default string "-dirty". $ git describe --dirty=.mod v1.6.5-15-gc274db7.mod Many build scripts use `git describe` to produce a version number based on the description of HEAD (on which the work tree is based) + saying that if the build contains uncommitted changes. This patch helps the writing of such scripts since `git describe --dirty` does directly the intended thing. Three possiblities were considered while discussing this new feature: 1. Describe the work tree by default and describe HEAD only if "HEAD" is explicitly specified Pro: does the right thing by default (both for users and for scripts) Pro: other git commands that works on the work tree by default Con: breaks existing scripts used by the Linux kernel and other projects 2. Use --worktree instead of --dirty Pro: does what it says: "git describe --worktree" describes the work tree Con: other commands do not require a --worktree option when working on the work tree (it often is the default mode for them) Con: unusable with an optional value: "git describe --worktree=.mod" is quite unintuitive. 3. Use --dirty as in this patch Pro: makes sense to specify an optional value (what the dirty mark is) Pro: does not have any of the big cons of previous alternatives * does not break scripts * is not inconsistent with other git commands This patch takes the third approach. Signed-off-by: Jean Privat <jean@pryen.org> Acked-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-21 13:35:22 +00:00
test_expect_success 'set-up dirty work tree' '
echo >>file
'
test_expect_success 'describe --dirty with --work-tree (dirty)' '
git describe --dirty >expected &&
(
cd "$TEST_DIRECTORY" &&
git --git-dir "$TRASH_DIRECTORY/.git" --work-tree "$TRASH_DIRECTORY" describe --dirty >"$TRASH_DIRECTORY/out"
) &&
grep -E "^A-8-g[0-9a-f]+-dirty$" out &&
test_cmp expected out
'
test_expect_success 'describe --dirty=.mod with --work-tree (dirty)' '
git describe --dirty=.mod >expected &&
(
cd "$TEST_DIRECTORY" &&
git --git-dir "$TRASH_DIRECTORY/.git" --work-tree "$TRASH_DIRECTORY" describe --dirty=.mod >"$TRASH_DIRECTORY/out"
) &&
grep -E "^A-8-g[0-9a-f]+.mod$" out &&
test_cmp expected out
'
Teach "git describe" --dirty option With the --dirty option, git describe works on HEAD but append s"-dirty" iff the contents of the work tree differs from HEAD. E.g. $ git describe --dirty v1.6.5-15-gc274db7 $ echo >> Makefile $ git describe --dirty v1.6.5-15-gc274db7-dirty The --dirty option can also be used to specify what is appended, instead of the default string "-dirty". $ git describe --dirty=.mod v1.6.5-15-gc274db7.mod Many build scripts use `git describe` to produce a version number based on the description of HEAD (on which the work tree is based) + saying that if the build contains uncommitted changes. This patch helps the writing of such scripts since `git describe --dirty` does directly the intended thing. Three possiblities were considered while discussing this new feature: 1. Describe the work tree by default and describe HEAD only if "HEAD" is explicitly specified Pro: does the right thing by default (both for users and for scripts) Pro: other git commands that works on the work tree by default Con: breaks existing scripts used by the Linux kernel and other projects 2. Use --worktree instead of --dirty Pro: does what it says: "git describe --worktree" describes the work tree Con: other commands do not require a --worktree option when working on the work tree (it often is the default mode for them) Con: unusable with an optional value: "git describe --worktree=.mod" is quite unintuitive. 3. Use --dirty as in this patch Pro: makes sense to specify an optional value (what the dirty mark is) Pro: does not have any of the big cons of previous alternatives * does not break scripts * is not inconsistent with other git commands This patch takes the third approach. Signed-off-by: Jean Privat <jean@pryen.org> Acked-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-21 13:35:22 +00:00
test_expect_success 'describe --dirty HEAD' '
test_must_fail git describe --dirty HEAD
'
test_expect_success 'set-up matching pattern tests' '
git tag -a -m test-annotated test-annotated &&
echo >>file &&
test_tick &&
git commit -a -m "one more" &&
git tag test1-lightweight &&
echo >>file &&
test_tick &&
git commit -a -m "yet another" &&
git tag test2-lightweight &&
echo >>file &&
test_tick &&
git commit -a -m "even more"
'
check_describe "test-annotated-3-gHASH" --match="test-*"
check_describe "test1-lightweight-2-gHASH" --tags --match="test1-*"
check_describe "test2-lightweight-1-gHASH" --tags --match="test2-*"
check_describe "test2-lightweight-0-gHASH" --long --tags --match="test2-*" HEAD^
check_describe "test2-lightweight-0-gHASH" --long --tags --match="test1-*" --match="test2-*" HEAD^
check_describe "test2-lightweight-0-gHASH" --long --tags --match="test1-*" --no-match --match="test2-*" HEAD^
check_describe "test1-lightweight-2-gHASH" --long --tags --match="test1-*" --match="test3-*" HEAD
check_describe "test1-lightweight-2-gHASH" --long --tags --match="test3-*" --match="test1-*" HEAD
test_expect_success 'set-up branches' '
git branch branch_A A &&
git branch branch_C c &&
git update-ref refs/remotes/origin/remote_branch_A "A^{commit}" &&
git update-ref refs/remotes/origin/remote_branch_C "c^{commit}" &&
git update-ref refs/original/original_branch_A test-annotated~2
'
check_describe "heads/branch_A-11-gHASH" --all --match="branch_*" --exclude="branch_C" HEAD
check_describe "remotes/origin/remote_branch_A-11-gHASH" --all --match="origin/remote_branch_*" --exclude="origin/remote_branch_C" HEAD
check_describe "original/original_branch_A-6-gHASH" --all test-annotated~1
test_expect_success '--match does not work for other types' '
test_must_fail git describe --all --match="*original_branch_*" test-annotated~1
'
test_expect_success '--exclude does not work for other types' '
R=$(git describe --all --exclude="any_pattern_even_not_matching" test-annotated~1) &&
case "$R" in
*original_branch_A*) echo "fail: Found unknown reference $R with --exclude"
false;;
*) echo ok: Found some known type;;
esac
'
test_expect_success 'name-rev with exact tags' '
echo A >expect &&
tag_object=$(git rev-parse refs/tags/A) &&
git name-rev --tags --name-only $tag_object >actual &&
test_cmp expect actual &&
echo "A^0" >expect &&
tagged_commit=$(git rev-parse "refs/tags/A^0") &&
git name-rev --tags --name-only $tagged_commit >actual &&
test_cmp expect actual
'
test_expect_success 'name-rev --all' '
>expect.unsorted &&
for rev in $(git rev-list --all)
do
git name-rev $rev >>expect.unsorted || return 1
done &&
sort <expect.unsorted >expect &&
git name-rev --all >actual.unsorted &&
sort <actual.unsorted >actual &&
test_cmp expect actual
'
test_expect_success 'name-rev --annotate-stdin' '
>expect.unsorted &&
for rev in $(git rev-list --all)
do
name=$(git name-rev --name-only $rev) &&
echo "$rev ($name)" >>expect.unsorted || return 1
done &&
sort <expect.unsorted >expect &&
git rev-list --all | git name-rev --annotate-stdin >actual.unsorted &&
sort <actual.unsorted >actual &&
test_cmp expect actual
'
test_expect_success 'name-rev --stdin deprecated' "
git rev-list --all | git name-rev --stdin 2>actual &&
grep -E 'warning: --stdin is deprecated' actual
"
test_expect_success 'describe --contains with the exact tags' '
echo "A^0" >expect &&
tag_object=$(git rev-parse refs/tags/A) &&
git describe --contains $tag_object >actual &&
test_cmp expect actual &&
echo "A^0" >expect &&
tagged_commit=$(git rev-parse "refs/tags/A^0") &&
git describe --contains $tagged_commit >actual &&
test_cmp expect actual
'
test_expect_success 'describe --contains and --match' '
echo "A^0" >expect &&
tagged_commit=$(git rev-parse "refs/tags/A^0") &&
test_must_fail git describe --contains --match="B" $tagged_commit &&
git describe --contains --match="B" --match="A" $tagged_commit >actual &&
test_cmp expect actual
'
test_expect_success 'describe --exclude' '
echo "c~1" >expect &&
tagged_commit=$(git rev-parse "refs/tags/A^0") &&
test_must_fail git describe --contains --match="B" $tagged_commit &&
git describe --contains --match="?" --exclude="A" $tagged_commit >actual &&
test_cmp expect actual
'
test_expect_success 'describe --contains and --no-match' '
echo "A^0" >expect &&
tagged_commit=$(git rev-parse "refs/tags/A^0") &&
git describe --contains --match="B" --no-match $tagged_commit >actual &&
test_cmp expect actual
'
test_expect_success 'setup and absorb a submodule' '
test_create_repo sub1 &&
test_commit -C sub1 initial &&
git submodule add ./sub1 &&
git submodule absorbgitdirs &&
git commit -a -m "add submodule" &&
git describe --dirty >expect &&
git describe --broken >out &&
test_cmp expect out
'
test_expect_success 'describe chokes on severely broken submodules' '
mv .git/modules/sub1/ .git/modules/sub_moved &&
test_must_fail git describe --dirty
'
test_expect_success 'describe ignoring a broken submodule' '
git describe --broken >out &&
grep broken out
'
test_expect_success 'describe with --work-tree ignoring a broken submodule' '
(
cd "$TEST_DIRECTORY" &&
git --git-dir "$TRASH_DIRECTORY/.git" --work-tree "$TRASH_DIRECTORY" describe --broken >"$TRASH_DIRECTORY/out"
) &&
test_when_finished "mv .git/modules/sub_moved .git/modules/sub1" &&
grep broken out
'
builtin/describe.c: describe a blob Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) When describing commits, we try to anchor them to tags or refs, as these are conceptually on a higher level than the commit. And if there is no ref or tag that matches exactly, we're out of luck. So we employ a heuristic to make up a name for the commit. These names are ambiguous, there might be different tags or refs to anchor to, and there might be different path in the DAG to travel to arrive at the commit precisely. When describing a blob, we want to describe the blob from a higher layer as well, which is a tuple of (commit, deep/path) as the tree objects involved are rather uninteresting. The same blob can be referenced by multiple commits, so how we decide which commit to use? This patch implements a rather naive approach on this: As there are no back pointers from blobs to commits in which the blob occurs, we'll start walking from any tips available, listing the blobs in-order of the commit and once we found the blob, we'll take the first commit that listed the blob. For example git describe --tags v0.99:Makefile conversion-901-g7672db20c2:Makefile tells us the Makefile as it was in v0.99 was introduced in commit 7672db20. The walking is performed in reverse order to show the introduction of a blob rather than its last occurrence. [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-16 02:00:39 +00:00
test_expect_success 'describe a blob at a directly tagged commit' '
echo "make it a unique blob" >file &&
git add file && git commit -m "content in file" &&
git tag -a -m "latest annotated tag" unique-file &&
git describe HEAD:file >actual &&
echo "unique-file:file" >expect &&
test_cmp expect actual
'
test_expect_success 'describe a blob with its first introduction' '
git commit --allow-empty -m "empty commit" &&
git rm file &&
git commit -m "delete blob" &&
git revert HEAD &&
git commit --allow-empty -m "empty commit" &&
git describe HEAD:file >actual &&
echo "unique-file:file" >expect &&
test_cmp expect actual
'
test_expect_success 'describe directly tagged blob' '
git tag test-blob unique-file:file &&
git describe test-blob >actual &&
echo "unique-file:file" >expect &&
# suboptimal: we rather want to see "test-blob"
test_cmp expect actual
'
test_expect_success 'describe tag object' '
git tag test-blob-1 -a -m msg unique-file:file &&
test_must_fail git describe test-blob-1 2>actual &&
test_grep "fatal: test-blob-1 is neither a commit nor blob" actual
builtin/describe.c: describe a blob Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) When describing commits, we try to anchor them to tags or refs, as these are conceptually on a higher level than the commit. And if there is no ref or tag that matches exactly, we're out of luck. So we employ a heuristic to make up a name for the commit. These names are ambiguous, there might be different tags or refs to anchor to, and there might be different path in the DAG to travel to arrive at the commit precisely. When describing a blob, we want to describe the blob from a higher layer as well, which is a tuple of (commit, deep/path) as the tree objects involved are rather uninteresting. The same blob can be referenced by multiple commits, so how we decide which commit to use? This patch implements a rather naive approach on this: As there are no back pointers from blobs to commits in which the blob occurs, we'll start walking from any tips available, listing the blobs in-order of the commit and once we found the blob, we'll take the first commit that listed the blob. For example git describe --tags v0.99:Makefile conversion-901-g7672db20c2:Makefile tells us the Makefile as it was in v0.99 was introduced in commit 7672db20. The walking is performed in reverse order to show the introduction of a blob rather than its last occurrence. [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-16 02:00:39 +00:00
'
name-rev: eliminate recursion in name_rev() The name_rev() function calls itself recursively for each interesting parent of the commit it got as parameter, and, consequently, it can segfault when processing a deep history if it exhausts the available stack space. E.g. running 'git name-rev --all' and 'git name-rev HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories results in segfaults on my machine ('ulimit -s' reports 8192kB of stack size limit), and nowadays the former segfaults in the Linux repo as well (it reached the necessasry depth sometime between v5.3-rc4 and -rc5). Eliminate the recursion by inserting the interesting parents into a LIFO 'prio_queue' [1] and iterating until the queue becomes empty. Note that the parent commits must be added in reverse order to the LIFO 'prio_queue', so their relative order is preserved during processing, i.e. the first parent should come out first from the queue, because otherwise performance greatly suffers on mergy histories [2]. The stacksize-limited test 'name-rev works in a deep repo' in 't6120-describe.sh' demonstrated this issue and expected failure. Now the recursion is gone, so flip it to expect success. Also gone are the dmesg entries logging the segfault of that segfaulting 'git name-rev' process on every execution of the test suite. Note that this slightly changes the order of lines in the output of 'git name-rev --all', usually swapping two lines every 35 lines in git.git or every 150 lines in linux.git. This shouldn't matter in practice, because the output has always been unordered anyway. This patch is best viewed with '--ignore-all-space'. [1] Early versions of this patch used a 'commit_list', resulting in ~15% performance penalty for 'git name-rev --all' in 'linux.git', presumably because of the memory allocation and release for each insertion and removal. Using a LIFO 'prio_queue' has basically no effect on performance. [2] We prefer shorter names, i.e. 'v0.1~234' is preferred over 'v0.1^2~5', meaning that usually following the first parent of a merge results in the best name for its ancestors. So when later we follow the remaining parent(s) of a merge, and reach an already named commit, then we usually find that we can't give that commit a better name, and thus we don't have to visit any of its ancestors again. OTOH, if we were to follow the Nth parent of the merge first, then the name of all its ancestors would include a corresponding '^N'. Those are not the best names for those commits, so when later we reach an already named commit following the first parent of that merge, then we would have to update the name of that commit and the names of all of its ancestors as well. Consequently, we would have to visit many commits several times, resulting in a significant slowdown. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 11:52:57 +00:00
test_expect_success ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
i=1 &&
while test $i -lt 8000
do
echo "commit refs/heads/main
committer A U Thor <author@example.com> $((1000000000 + $i * 100)) +0200
data <<EOF
commit #$i
EOF" &&
if test $i = 1
then
echo "from refs/heads/main^0"
fi &&
i=$(($i + 1)) || return 1
done | git fast-import &&
git checkout main &&
git tag far-far-away HEAD^ &&
echo "HEAD~4000 tags/far-far-away~3999" >expect &&
git name-rev HEAD~4000 >actual &&
test_cmp expect actual &&
run_with_limited_stack git name-rev HEAD~4000 >actual &&
test_cmp expect actual
'
test_expect_success ULIMIT_STACK_SIZE 'describe works in a deep repo' '
git tag -f far-far-away HEAD~7999 &&
echo "far-far-away" >expect &&
git describe --tags --abbrev=0 HEAD~4000 >actual &&
test_cmp expect actual &&
run_with_limited_stack git describe --tags --abbrev=0 HEAD~4000 >actual &&
test_cmp expect actual
'
check_describe tags/A --all A
check_describe tags/c --all c
check_describe heads/branch_A --all --match='branch_*' branch_A
describe: confirm that blobs actually exist Prior to 644eb60bd0 (builtin/describe.c: describe a blob, 2017-11-15), we noticed and complained about missing objects, since they were not valid commits: $ git describe 0000000000000000000000000000000000000000 fatal: 0000000000000000000000000000000000000000 is not a valid 'commit' object After that commit, we feed any non-commit to lookup_blob(), and complain only if it returns NULL. But the lookup_* functions do not actually look at the on-disk object database at all. They return an entry from the in-memory object hash if present (and if it matches the requested type), and otherwise auto-create a "struct object" of the requested type. A missing object would hit that latter case: we create a bogus blob struct, walk all of history looking for it, and then exit successfully having produced no output. One reason nobody may have noticed this is that some related cases do still work OK: 1. If we ask for a tree by sha1, then the call to lookup_commit_referecne_gently() would have parsed it, and we would have its true type in the in-memory object hash. 2. If we ask for a name that doesn't exist but isn't a 40-hex sha1, then get_oid() would complain before we even look at the objects at all. We can fix this by replacing the lookup_blob() call with a check of the true type via sha1_object_info(). This is not quite as efficient as we could possibly make this check. We know in most cases that the object was already parsed in the earlier commit lookup, so we could call lookup_object(), which does auto-create, and check the resulting struct's type (or NULL). However it's not worth the fragility nor code complexity to save a single object lookup. The new tests cover this case, as well as that of a tree-by-sha1 (which does work as described above, but was not explicitly tested). Noticed-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net> Acked-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-12 17:23:06 +00:00
test_expect_success 'describe complains about tree object' '
test_must_fail git describe HEAD^{tree}
'
test_expect_success 'describe complains about missing object' '
test_must_fail git describe $ZERO_OID
describe: confirm that blobs actually exist Prior to 644eb60bd0 (builtin/describe.c: describe a blob, 2017-11-15), we noticed and complained about missing objects, since they were not valid commits: $ git describe 0000000000000000000000000000000000000000 fatal: 0000000000000000000000000000000000000000 is not a valid 'commit' object After that commit, we feed any non-commit to lookup_blob(), and complain only if it returns NULL. But the lookup_* functions do not actually look at the on-disk object database at all. They return an entry from the in-memory object hash if present (and if it matches the requested type), and otherwise auto-create a "struct object" of the requested type. A missing object would hit that latter case: we create a bogus blob struct, walk all of history looking for it, and then exit successfully having produced no output. One reason nobody may have noticed this is that some related cases do still work OK: 1. If we ask for a tree by sha1, then the call to lookup_commit_referecne_gently() would have parsed it, and we would have its true type in the in-memory object hash. 2. If we ask for a name that doesn't exist but isn't a 40-hex sha1, then get_oid() would complain before we even look at the objects at all. We can fix this by replacing the lookup_blob() call with a check of the true type via sha1_object_info(). This is not quite as efficient as we could possibly make this check. We know in most cases that the object was already parsed in the earlier commit lookup, so we could call lookup_object(), which does auto-create, and check the resulting struct's type (or NULL). However it's not worth the fragility nor code complexity to save a single object lookup. The new tests cover this case, as well as that of a tree-by-sha1 (which does work as described above, but was not explicitly tested). Noticed-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net> Acked-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-12 17:23:06 +00:00
'
test_expect_success 'name-rev a rev shortly after epoch' '
test_when_finished "git checkout main" &&
git checkout --orphan no-timestamp-underflow &&
# Any date closer to epoch than the CUTOFF_DATE_SLOP constant
# in builtin/name-rev.c.
GIT_COMMITTER_DATE="@1234 +0000" \
git commit -m "committer date shortly after epoch" &&
old_commit_oid=$(git rev-parse HEAD) &&
echo "$old_commit_oid no-timestamp-underflow" >expect &&
git name-rev $old_commit_oid >actual &&
test_cmp expect actual
'
# A--------------main
t6120: add a test to cover inner conditions in 'git name-rev's name_rev() In 'builtin/name-rev.c' in the name_rev() function there is a loop iterating over all parents of the given commit, and the loop body looks like this: if (parent_number > 1) { if (generation > 0) // branch #1 new_name = ... else // branch #2 new_name = ... name_rev(parent, new_name, ...); } else { // branch #3 name_rev(...); } These conditions are not covered properly in the test suite. As far as purely test coverage goes, they are all executed several times over in 't6120-describe.sh'. However, they don't directly influence the command's output, because the repository used in that test script contains several branches and tags pointing somewhere into the middle of the commit DAG, and thus result in a better name for the to-be-named commit. This can hide bugs: e.g. by replacing the 'new_name' parameter of the first recursive name_rev() call with 'tip_name' (effectively making both branch #1 and #2 a noop) 'git name-rev --all' shows thousands of bogus names in the Git repository, but the whole test suite still passes successfully. In an early version of a later patch in this series I managed to mess up all three branches (at once!), but the test suite still passed. So add a new test case that operates on the following history: A--------------master \ / \----------M2 \ / \---M1-C \ / B and names the commit 'B' to make sure that all three branches are crucial to determine 'B's name: - There is only a single ref, so all names are based on 'master', without any undesired interference from other refs. - Each time name_rev() follows the second parent of a merge commit, it appends "^2" to the name. Following 'master's second parent right at the start ensures that all commits on the ancestry path from 'master' to 'B' have a different base name from the original 'tip_name' of the very first name_rev() invocation. Currently, while name_rev() is recursive, it doesn't matter, but it will be necessary to properly cover all three branches after the recursion is eliminated later in this series. - Following 'M2's second parent makes sure that branch #2 (i.e. when 'generation = 0') affects 'B's name. - Following the only parent of the non-merge commit 'C' ensures that branch #3 affects 'B's name, and that it increments 'generation'. - Coming from 'C' 'generation' is 1, thus following 'M1's second parent makes sure that branch #1 affects 'B's name. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-12 10:38:14 +00:00
# \ /
# \----------M2
# \ /
# \---M1-C
# \ /
# B
test_expect_success 'name-rev covers all conditions while looking at parents' '
git init repo &&
(
cd repo &&
echo A >file &&
git add file &&
git commit -m A &&
A=$(git rev-parse HEAD) &&
git checkout --detach &&
echo B >file &&
git commit -m B file &&
B=$(git rev-parse HEAD) &&
git checkout $A &&
git merge --no-ff $B && # M1
echo C >file &&
git commit -m C file &&
git checkout $A &&
git merge --no-ff HEAD@{1} && # M2
git checkout main &&
t6120: add a test to cover inner conditions in 'git name-rev's name_rev() In 'builtin/name-rev.c' in the name_rev() function there is a loop iterating over all parents of the given commit, and the loop body looks like this: if (parent_number > 1) { if (generation > 0) // branch #1 new_name = ... else // branch #2 new_name = ... name_rev(parent, new_name, ...); } else { // branch #3 name_rev(...); } These conditions are not covered properly in the test suite. As far as purely test coverage goes, they are all executed several times over in 't6120-describe.sh'. However, they don't directly influence the command's output, because the repository used in that test script contains several branches and tags pointing somewhere into the middle of the commit DAG, and thus result in a better name for the to-be-named commit. This can hide bugs: e.g. by replacing the 'new_name' parameter of the first recursive name_rev() call with 'tip_name' (effectively making both branch #1 and #2 a noop) 'git name-rev --all' shows thousands of bogus names in the Git repository, but the whole test suite still passes successfully. In an early version of a later patch in this series I managed to mess up all three branches (at once!), but the test suite still passed. So add a new test case that operates on the following history: A--------------master \ / \----------M2 \ / \---M1-C \ / B and names the commit 'B' to make sure that all three branches are crucial to determine 'B's name: - There is only a single ref, so all names are based on 'master', without any undesired interference from other refs. - Each time name_rev() follows the second parent of a merge commit, it appends "^2" to the name. Following 'master's second parent right at the start ensures that all commits on the ancestry path from 'master' to 'B' have a different base name from the original 'tip_name' of the very first name_rev() invocation. Currently, while name_rev() is recursive, it doesn't matter, but it will be necessary to properly cover all three branches after the recursion is eliminated later in this series. - Following 'M2's second parent makes sure that branch #2 (i.e. when 'generation = 0') affects 'B's name. - Following the only parent of the non-merge commit 'C' ensures that branch #3 affects 'B's name, and that it increments 'generation'. - Coming from 'C' 'generation' is 1, thus following 'M1's second parent makes sure that branch #1 affects 'B's name. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-12 10:38:14 +00:00
git merge --no-ff HEAD@{1} &&
echo "$B main^2^2~1^2" >expect &&
t6120: add a test to cover inner conditions in 'git name-rev's name_rev() In 'builtin/name-rev.c' in the name_rev() function there is a loop iterating over all parents of the given commit, and the loop body looks like this: if (parent_number > 1) { if (generation > 0) // branch #1 new_name = ... else // branch #2 new_name = ... name_rev(parent, new_name, ...); } else { // branch #3 name_rev(...); } These conditions are not covered properly in the test suite. As far as purely test coverage goes, they are all executed several times over in 't6120-describe.sh'. However, they don't directly influence the command's output, because the repository used in that test script contains several branches and tags pointing somewhere into the middle of the commit DAG, and thus result in a better name for the to-be-named commit. This can hide bugs: e.g. by replacing the 'new_name' parameter of the first recursive name_rev() call with 'tip_name' (effectively making both branch #1 and #2 a noop) 'git name-rev --all' shows thousands of bogus names in the Git repository, but the whole test suite still passes successfully. In an early version of a later patch in this series I managed to mess up all three branches (at once!), but the test suite still passed. So add a new test case that operates on the following history: A--------------master \ / \----------M2 \ / \---M1-C \ / B and names the commit 'B' to make sure that all three branches are crucial to determine 'B's name: - There is only a single ref, so all names are based on 'master', without any undesired interference from other refs. - Each time name_rev() follows the second parent of a merge commit, it appends "^2" to the name. Following 'master's second parent right at the start ensures that all commits on the ancestry path from 'master' to 'B' have a different base name from the original 'tip_name' of the very first name_rev() invocation. Currently, while name_rev() is recursive, it doesn't matter, but it will be necessary to properly cover all three branches after the recursion is eliminated later in this series. - Following 'M2's second parent makes sure that branch #2 (i.e. when 'generation = 0') affects 'B's name. - Following the only parent of the non-merge commit 'C' ensures that branch #3 affects 'B's name, and that it increments 'generation'. - Coming from 'C' 'generation' is 1, thus following 'M1's second parent makes sure that branch #1 affects 'B's name. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-12 10:38:14 +00:00
git name-rev $B >actual &&
test_cmp expect actual
)
'
name-rev: use generation numbers if available If a commit in a sequence of linear history has a non-monotonically increasing commit timestamp, git name-rev might not properly name the commit. This occurs because name-rev uses a heuristic of the commit date to avoid searching down tags which lead to commits that are older than the named commit. This is intended to avoid work on larger repositories. This heuristic impacts git name-rev, and by extension git describe --contains which is built on top of name-rev. Further more, if --all or --annotate-stdin is used, the heuristic is not enabled because the full history has to be analyzed anyways. This results in some confusion if a user sees that --annotate-stdin works but a normal name-rev does not. If the repository has a commit graph, we can use the generation numbers instead of using the commit dates. This is essentially the same check except that generation numbers make it exact, where the commit date heuristic could be incorrect due to clock errors. Since we're extending the notion of cutoff to more than one variable, create a series of functions for setting and checking the cutoff. This avoids duplication and moves access of the global cutoff and generation_cutoff to as few functions as possible. Add several test cases including a test that covers the new commitGraph behavior, as well as tests for --all and --annotate-stdin with and without commitGraphs. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-12 00:00:15 +00:00
# A-B-C-D-E-main
#
# Where C has a non-monotonically increasing commit timestamp w.r.t. other
# commits
test_expect_success 'non-monotonic commit dates setup' '
UNIX_EPOCH_ZERO="@0 +0000" &&
git init non-monotonic &&
test_commit -C non-monotonic A &&
test_commit -C non-monotonic --no-tag B &&
test_commit -C non-monotonic --no-tag --date "$UNIX_EPOCH_ZERO" C &&
test_commit -C non-monotonic D &&
test_commit -C non-monotonic E
'
test_expect_success 'name-rev with commitGraph handles non-monotonic timestamps' '
test_config -C non-monotonic core.commitGraph true &&
(
cd non-monotonic &&
git commit-graph write --reachable &&
echo "main~3 tags/D~2" >expect &&
git name-rev --tags main~3 >actual &&
test_cmp expect actual
)
'
test_expect_success 'name-rev --all works with non-monotonic timestamps' '
test_config -C non-monotonic core.commitGraph false &&
(
cd non-monotonic &&
rm -rf .git/info/commit-graph* &&
cat >tags <<-\EOF &&
tags/E
tags/D
tags/D~1
tags/D~2
tags/A
EOF
git log --pretty=%H >revs &&
paste -d" " revs tags | sort >expect &&
git name-rev --tags --all | sort >actual &&
test_cmp expect actual
)
'
test_expect_success 'name-rev --annotate-stdin works with non-monotonic timestamps' '
test_config -C non-monotonic core.commitGraph false &&
(
cd non-monotonic &&
rm -rf .git/info/commit-graph* &&
cat >expect <<-\EOF &&
E
D
D~1
D~2
A
EOF
git log --pretty=%H >revs &&
git name-rev --tags --annotate-stdin --name-only <revs >actual &&
test_cmp expect actual
)
'
test_expect_success 'name-rev --all works with commitGraph' '
test_config -C non-monotonic core.commitGraph true &&
(
cd non-monotonic &&
git commit-graph write --reachable &&
cat >tags <<-\EOF &&
tags/E
tags/D
tags/D~1
tags/D~2
tags/A
EOF
git log --pretty=%H >revs &&
paste -d" " revs tags | sort >expect &&
git name-rev --tags --all | sort >actual &&
test_cmp expect actual
)
'
test_expect_success 'name-rev --annotate-stdin works with commitGraph' '
test_config -C non-monotonic core.commitGraph true &&
(
cd non-monotonic &&
git commit-graph write --reachable &&
cat >expect <<-\EOF &&
E
D
D~1
D~2
A
EOF
git log --pretty=%H >revs &&
git name-rev --tags --annotate-stdin --name-only <revs >actual &&
test_cmp expect actual
)
'
2020-02-26 17:48:53 +00:00
# B
# o
# \
# o-----o---o----x
# A
#
test_expect_success 'setup: describe commits with disjoint bases' '
2020-02-26 17:48:53 +00:00
git init disjoint1 &&
(
cd disjoint1 &&
echo o >> file && git add file && git commit -m o &&
echo A >> file && git add file && git commit -m A &&
git tag A -a -m A &&
echo o >> file && git add file && git commit -m o &&
git checkout --orphan branch && rm file &&
echo B > file2 && git add file2 && git commit -m B &&
git tag B -a -m B &&
git merge --no-ff --allow-unrelated-histories main -m x
2020-02-26 17:48:53 +00:00
)
'
check_describe -C disjoint1 "A-3-gHASH" HEAD
2020-02-26 17:48:53 +00:00
# B
# o---o---o------------.
# \
# o---o---x
# A
#
test_expect_success 'setup: describe commits with disjoint bases 2' '
2020-02-26 17:48:53 +00:00
git init disjoint2 &&
(
cd disjoint2 &&
echo A >> file && git add file && GIT_COMMITTER_DATE="2020-01-01 18:00" git commit -m A &&
git tag A -a -m A &&
echo o >> file && git add file && GIT_COMMITTER_DATE="2020-01-01 18:01" git commit -m o &&
git checkout --orphan branch &&
echo o >> file2 && git add file2 && GIT_COMMITTER_DATE="2020-01-01 15:00" git commit -m o &&
echo o >> file2 && git add file2 && GIT_COMMITTER_DATE="2020-01-01 15:01" git commit -m o &&
echo B >> file2 && git add file2 && GIT_COMMITTER_DATE="2020-01-01 15:02" git commit -m B &&
git tag B -a -m B &&
git merge --no-ff --allow-unrelated-histories main -m x
2020-02-26 17:48:53 +00:00
)
'
check_describe -C disjoint2 "B-3-gHASH" HEAD
name-rev: fix names by dropping taggerdate workaround Commit 7550424804 ("name-rev: include taggerdate in considering the best name", 2016-04-22) introduced the idea of using taggerdate in the criteria for selecting the best name. At the time, a certain commit in linux.git -- namely, aed06b9cfcab -- was being named by name-rev as v4.6-rc1~9^2~792 which, while correct, was very suboptimal. Some investigation found that tweaking the MERGE_TRAVERSAL_WEIGHT to lower it could give alternate answers such as v3.13-rc7~9^2~14^2~42 or v3.13~5^2~4^2~2^2~1^2~42 A manual solution involving looking at tagger dates came up with v3.13-rc1~65^2^2~42 which is much nicer. That workaround was then implemented in name-rev. Unfortunately, the taggerdate heuristic is causing bugs. I was pointed to a case in a private repository where name-rev reports a name of the form v2022.10.02~86 when users expected to see one of the form v2022.10.01~2 (I've modified the names and numbers a bit from the real testcase.) As you can probably guess, v2022.10.01 was created after v2022.10.02 (by a few hours), even though it pointed to an older commit. While the condition is unusual even in the repository in question, it is not the only problematic set of tags in that repository. The taggerdate logic is causing problems. Further, it turns out that this taggerdate heuristic isn't even helping anymore. Due to the fix to naming logic in 3656f84278 ("name-rev: prefer shorter names over following merges", 2021-12-04), we get improved names without the taggerdate heuristic. For the original commit of interest in linux.git, a modern git without the taggerdate heuristic still provides the same optimal answer of interest, namely: v3.13-rc1~65^2^2~42 So, the taggerdate is no longer providing benefit, and it is causing problems. Simply get rid of it. However, note that "taggerdate" as a variable is used to store things besides a taggerdate these days. Ever since commit ef1e74065c ("name-rev: favor describing with tags and use committer date to tiebreak", 2017-03-29), this has been used to store committer dates and there it is used as a fallback tiebreaker (as opposed to a primary criteria overriding effective distance calculations). We do not want to remove that fallback tiebreaker, so not all instances of "taggerdate" are removed in this change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-09 09:11:46 +00:00
test_expect_success 'setup misleading taggerdates' '
GIT_COMMITTER_DATE="2006-12-12 12:31" git tag -a -m "another tag" newer-tag-older-commit unique-file~1
'
check_describe newer-tag-older-commit~1 --contains unique-file~2
test_done