git/t/t5604-clone-reference.sh

365 lines
9.4 KiB
Bash
Raw Normal View History

#!/bin/sh
#
# Copyright (C) 2006 Martin Waitz <tali@admingilde.org>
#
test_description='test clone --reference'
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
tests: mark tests relying on the current default for `init.defaultBranch` In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-18 23:44:19 +00:00
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
tests: mark tests as passing with SANITIZE=leak When the "ab/various-leak-fixes" topic was merged in [1] only t6021 would fail if the tests were run in the "GIT_TEST_PASSING_SANITIZE_LEAK=check" mode, i.e. to check whether we marked all leak-free tests with "TEST_PASSES_SANITIZE_LEAK=true". Since then we've had various tests starting to pass under SANITIZE=leak. Let's mark those as passing, this is when they started to pass, narrowed down with "git bisect": - t5317-pack-objects-filter-objects.sh: In faebba436e6 (list-objects-filter: plug pattern_list leak, 2022-12-01). - t3210-pack-refs.sh, t5613-info-alternate.sh, t7403-submodule-sync.sh: In 189e97bc4ba (diff: remove parseopts member from struct diff_options, 2022-12-01). - t1408-packed-refs.sh: In ab91f6b7c42 (Merge branch 'rs/diff-parseopts', 2022-12-19). - t0023-crlf-am.sh, t4152-am-subjects.sh, t4254-am-corrupt.sh, t4256-am-format-flowed.sh, t4257-am-interactive.sh, t5403-post-checkout-hook.sh: In a658e881c13 (am: don't pass strvec to apply_parse_options(), 2022-12-13) - t1301-shared-repo.sh, t1302-repo-version.sh: In b07a819c05f (reflog: clear leftovers in reflog_expiry_cleanup(), 2022-12-13). - t1304-default-acl.sh, t1410-reflog.sh, t5330-no-lazy-fetch-with-commit-graph.sh, t5502-quickfetch.sh, t5604-clone-reference.sh, t6014-rev-list-all.sh, t7701-repack-unpack-unreachable.sh: In b0c61be3209 (Merge branch 'rs/reflog-expiry-cleanup', 2022-12-26) - t3800-mktag.sh, t5302-pack-index.sh, t5306-pack-nobase.sh, t5573-pull-verify-signatures.sh, t7612-merge-verify-signatures.sh: In 69bbbe484ba (hash-object: use fsck for object checks, 2023-01-18). - t1451-fsck-buffer.sh: In 8e4309038f0 (fsck: do not assume NUL-termination of buffers, 2023-01-19). - t6501-freshen-objects.sh: In abf2bb895b4 (Merge branch 'jk/hash-object-fsck', 2023-01-30) 1. 9ea1378d046 (Merge branch 'ab/various-leak-fixes', 2022-12-14) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 23:07:36 +00:00
TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
base_dir=$(pwd)
U=$base_dir/UPLOAD_LOG
# create a commit in repo $1 with name $2
commit_in () {
(
cd "$1" &&
echo "$2" >"$2" &&
git add "$2" &&
git commit -m "$2"
)
}
# check that there are $2 loose objects in repo $1
test_objcount () {
echo "$2" >expect &&
git -C "$1" count-objects >actual.raw &&
cut -d' ' -f1 <actual.raw >actual &&
test_cmp expect actual
}
test_expect_success 'preparing first repository' '
test_create_repo A &&
commit_in A file1
'
test_expect_success 'preparing second repository' '
git clone A B &&
commit_in B file2 &&
git -C B repack -ad &&
git -C B prune
'
test_expect_success 'cloning with reference (-l -s)' '
git clone -l -s --reference B A C
'
test_expect_success 'existence of info/alternates' '
test_line_count = 2 C/.git/objects/info/alternates
'
test_expect_success 'pulling from reference' '
git -C C pull ../B main
'
test_expect_success 'that reference gets used' '
test_objcount C 0
'
do not use GIT_TRACE_PACKET=3 in tests Some test scripts use the GIT_TRACE mechanism to dump debugging information to descriptor 3 (and point it to a file using the shell). On Windows, however, bash is unable to set up descriptor 3. We do not write our trace to the file, and worse, we may interfere with other operations happening on descriptor 3, causing tests to fail or even behave inconsistently. Prior to commit 97a83fa (upload-pack: remove packet debugging harness), these tests used GIT_DEBUG_SEND_PACK, which only supported output to a descriptor. The tests in t5503 were always broken on Windows, and were marked to be skipped via the NOT_MINGW prerequisite. In t5700, the tests used to pass prior to 97a83fa, but only because they were not careful enough; because we only grepped the trace file, an empty file looked successful to us. But post-97a83fa, the writing to descriptor 3 causes "git fetch" to hang (presumably because we are throwing random bytes into the middle of the protocol). Now that we are using the GIT_TRACE mechanism, we can improve both scripts by asking git to write directly to a file rather than a descriptor. That fixes the hang in t5700, and should allow t5503 to successfully run on Windows. In both cases we now also use "test -s" to double-check that our trace file actually contains output, which should reduce the possibility of an erroneously passing test. Signed-off-by: Jeff King <peff@peff.net> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-20 17:43:47 +00:00
test_expect_success 'cloning with reference (no -l -s)' '
GIT_TRACE_PACKET=$U.D git clone --reference B "file://$(pwd)/A" D
'
do not use GIT_TRACE_PACKET=3 in tests Some test scripts use the GIT_TRACE mechanism to dump debugging information to descriptor 3 (and point it to a file using the shell). On Windows, however, bash is unable to set up descriptor 3. We do not write our trace to the file, and worse, we may interfere with other operations happening on descriptor 3, causing tests to fail or even behave inconsistently. Prior to commit 97a83fa (upload-pack: remove packet debugging harness), these tests used GIT_DEBUG_SEND_PACK, which only supported output to a descriptor. The tests in t5503 were always broken on Windows, and were marked to be skipped via the NOT_MINGW prerequisite. In t5700, the tests used to pass prior to 97a83fa, but only because they were not careful enough; because we only grepped the trace file, an empty file looked successful to us. But post-97a83fa, the writing to descriptor 3 causes "git fetch" to hang (presumably because we are throwing random bytes into the middle of the protocol). Now that we are using the GIT_TRACE mechanism, we can improve both scripts by asking git to write directly to a file rather than a descriptor. That fixes the hang in t5700, and should allow t5503 to successfully run on Windows. In both cases we now also use "test -s" to double-check that our trace file actually contains output, which should reduce the possibility of an erroneously passing test. Signed-off-by: Jeff King <peff@peff.net> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-20 17:43:47 +00:00
test_expect_success 'fetched no objects' '
test -s "$U.D" &&
! grep " want" "$U.D"
'
test_expect_success 'existence of info/alternates' '
test_line_count = 1 D/.git/objects/info/alternates
'
test_expect_success 'pulling from reference' '
git -C D pull ../B main
'
test_expect_success 'that reference gets used' '
test_objcount D 0
'
test_expect_success 'updating origin' '
commit_in A file3 &&
git -C A repack -ad &&
git -C A prune
'
test_expect_success 'pulling changes from origin' '
git -C C pull --no-rebase origin
'
# the 2 local objects are commit and tree from the merge
test_expect_success 'that alternate to origin gets used' '
test_objcount C 2
'
test_expect_success 'pulling changes from origin' '
git -C D pull --no-rebase origin
'
# the 5 local objects are expected; file3 blob, commit in A to add it
# and its tree, and 2 are our tree and the merge commit.
test_expect_success 'check objects expected to exist locally' '
test_objcount D 5
'
test_expect_success 'preparing alternate repository #1' '
test_create_repo F &&
commit_in F file1
'
test_expect_success 'cloning alternate repo #2 and adding changes to repo #1' '
git clone F G &&
commit_in F file2
'
test_expect_success 'cloning alternate repo #1, using #2 as reference' '
git clone --reference G F H
'
test_expect_success 'cloning with reference being subset of source (-l -s)' '
git clone -l -s --reference A B E
'
test_expect_success 'cloning with multiple references drops duplicates' '
git clone -s --reference B --reference A --reference B A dups &&
test_line_count = 2 dups/.git/objects/info/alternates
'
test_expect_success 'clone with reference from a tagged repository' '
(
cd A && git tag -a -m tagged HEAD
) &&
git clone --reference=A A I
'
test_expect_success 'prepare branched repository' '
git clone A J &&
(
cd J &&
git checkout -b other main^ &&
echo other >otherfile &&
git add otherfile &&
git commit -m other &&
git checkout main
)
'
test_expect_success 'fetch with incomplete alternates' '
git init K &&
echo "$base_dir/A/.git/objects" >K/.git/objects/info/alternates &&
(
cd K &&
git remote add J "file://$base_dir/J" &&
do not use GIT_TRACE_PACKET=3 in tests Some test scripts use the GIT_TRACE mechanism to dump debugging information to descriptor 3 (and point it to a file using the shell). On Windows, however, bash is unable to set up descriptor 3. We do not write our trace to the file, and worse, we may interfere with other operations happening on descriptor 3, causing tests to fail or even behave inconsistently. Prior to commit 97a83fa (upload-pack: remove packet debugging harness), these tests used GIT_DEBUG_SEND_PACK, which only supported output to a descriptor. The tests in t5503 were always broken on Windows, and were marked to be skipped via the NOT_MINGW prerequisite. In t5700, the tests used to pass prior to 97a83fa, but only because they were not careful enough; because we only grepped the trace file, an empty file looked successful to us. But post-97a83fa, the writing to descriptor 3 causes "git fetch" to hang (presumably because we are throwing random bytes into the middle of the protocol). Now that we are using the GIT_TRACE mechanism, we can improve both scripts by asking git to write directly to a file rather than a descriptor. That fixes the hang in t5700, and should allow t5503 to successfully run on Windows. In both cases we now also use "test -s" to double-check that our trace file actually contains output, which should reduce the possibility of an erroneously passing test. Signed-off-by: Jeff King <peff@peff.net> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-20 17:43:47 +00:00
GIT_TRACE_PACKET=$U.K git fetch J
) &&
main_object=$(cd A && git for-each-ref --format="%(objectname)" refs/heads/main) &&
do not use GIT_TRACE_PACKET=3 in tests Some test scripts use the GIT_TRACE mechanism to dump debugging information to descriptor 3 (and point it to a file using the shell). On Windows, however, bash is unable to set up descriptor 3. We do not write our trace to the file, and worse, we may interfere with other operations happening on descriptor 3, causing tests to fail or even behave inconsistently. Prior to commit 97a83fa (upload-pack: remove packet debugging harness), these tests used GIT_DEBUG_SEND_PACK, which only supported output to a descriptor. The tests in t5503 were always broken on Windows, and were marked to be skipped via the NOT_MINGW prerequisite. In t5700, the tests used to pass prior to 97a83fa, but only because they were not careful enough; because we only grepped the trace file, an empty file looked successful to us. But post-97a83fa, the writing to descriptor 3 causes "git fetch" to hang (presumably because we are throwing random bytes into the middle of the protocol). Now that we are using the GIT_TRACE mechanism, we can improve both scripts by asking git to write directly to a file rather than a descriptor. That fixes the hang in t5700, and should allow t5503 to successfully run on Windows. In both cases we now also use "test -s" to double-check that our trace file actually contains output, which should reduce the possibility of an erroneously passing test. Signed-off-by: Jeff King <peff@peff.net> Tested-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-20 17:43:47 +00:00
test -s "$U.K" &&
! grep " want $main_object" "$U.K" &&
tag_object=$(cd A && git for-each-ref --format="%(objectname)" refs/tags/HEAD) &&
! grep " want $tag_object" "$U.K"
'
test_expect_success 'clone using repo with gitfile as a reference' '
git clone --separate-git-dir=L A M &&
git clone --reference=M A N &&
echo "$base_dir/L/objects" >expected &&
test_cmp expected "$base_dir/N/.git/objects/info/alternates"
'
test_expect_success 'clone using repo pointed at by gitfile as reference' '
git clone --reference=M/.git A O &&
echo "$base_dir/L/objects" >expected &&
test_cmp expected "$base_dir/O/.git/objects/info/alternates"
'
test_expect_success 'clone and dissociate from reference' '
git init P &&
(
cd P && test_commit one
) &&
git clone P Q &&
(
cd Q && test_commit two
) &&
git clone --no-local --reference=P Q R &&
git clone --no-local --reference=P --dissociate Q S &&
# removing the reference P would corrupt R but not S
rm -fr P &&
test_must_fail git -C R fsck &&
git -C S fsck
'
test_expect_success 'clone, dissociate from partial reference and repack' '
rm -fr P Q R &&
git init P &&
(
cd P &&
test_commit one &&
git repack &&
test_commit two &&
git repack
) &&
git clone --bare P Q &&
(
cd P &&
git checkout -b second &&
test_commit three &&
git repack
) &&
git clone --bare --dissociate --reference=P Q R &&
ls R/objects/pack/*.pack >packs.txt &&
test_line_count = 1 packs.txt
'
test_expect_success 'clone, dissociate from alternates' '
rm -fr A B C &&
test_create_repo A &&
commit_in A file1 &&
git clone --reference=A A B &&
test_line_count = 1 B/.git/objects/info/alternates &&
git clone --local --dissociate B C &&
! test -f C/.git/objects/info/alternates &&
( cd C && git fsck )
'
test_expect_success 'setup repo with garbage in objects/*' '
git init S &&
(
cd S &&
test_commit A &&
cd .git/objects &&
>.some-hidden-file &&
>some-file &&
mkdir .some-hidden-dir &&
>.some-hidden-dir/some-file &&
>.some-hidden-dir/.some-dot-file &&
mkdir some-dir &&
>some-dir/some-file &&
>some-dir/.some-dot-file
)
'
test_expect_success 'clone a repo with garbage in objects/*' '
for option in --local --no-hardlinks --shared --dissociate
do
git clone $option S S$option || return 1 &&
git -C S$option fsck || return 1
done &&
find S-* -name "*some*" | sort >actual &&
cat >expected <<-EOF &&
S--dissociate/.git/objects/.some-hidden-dir
S--dissociate/.git/objects/.some-hidden-dir/.some-dot-file
S--dissociate/.git/objects/.some-hidden-dir/some-file
S--dissociate/.git/objects/.some-hidden-file
S--dissociate/.git/objects/some-dir
S--dissociate/.git/objects/some-dir/.some-dot-file
S--dissociate/.git/objects/some-dir/some-file
S--dissociate/.git/objects/some-file
S--local/.git/objects/.some-hidden-dir
S--local/.git/objects/.some-hidden-dir/.some-dot-file
S--local/.git/objects/.some-hidden-dir/some-file
S--local/.git/objects/.some-hidden-file
S--local/.git/objects/some-dir
S--local/.git/objects/some-dir/.some-dot-file
S--local/.git/objects/some-dir/some-file
S--local/.git/objects/some-file
S--no-hardlinks/.git/objects/.some-hidden-dir
S--no-hardlinks/.git/objects/.some-hidden-dir/.some-dot-file
S--no-hardlinks/.git/objects/.some-hidden-dir/some-file
S--no-hardlinks/.git/objects/.some-hidden-file
S--no-hardlinks/.git/objects/some-dir
S--no-hardlinks/.git/objects/some-dir/.some-dot-file
S--no-hardlinks/.git/objects/some-dir/some-file
S--no-hardlinks/.git/objects/some-file
EOF
test_cmp expected actual
'
test_expect_success SYMLINKS 'setup repo with manually symlinked or unknown files at objects/' '
git init T &&
(
cd T &&
git config gc.auto 0 &&
test_commit A &&
git gc &&
test_commit B &&
cd .git/objects &&
mv pack packs &&
ln -s packs pack &&
find ?? -type d >loose-dirs &&
last_loose=$(tail -n 1 loose-dirs) &&
mv $last_loose a-loose-dir &&
ln -s a-loose-dir $last_loose &&
first_loose=$(head -n 1 loose-dirs) &&
rm -f loose-dirs &&
cd $first_loose &&
obj=$(ls *) &&
mv $obj ../an-object &&
ln -s ../an-object $obj &&
cd ../ &&
echo unknown_content >unknown_file
) &&
git -C T fsck &&
git -C T rev-list --all --objects >T.objects
'
test_expect_success SYMLINKS 'clone repo with symlinked or unknown files at objects/' '
builtin/clone.c: disallow `--local` clones with symlinks When cloning a repository with `--local`, Git relies on either making a hardlink or copy to every file in the "objects" directory of the source repository. This is done through the callpath `cmd_clone()` -> `clone_local()` -> `copy_or_link_directory()`. The way this optimization works is by enumerating every file and directory recursively in the source repository's `$GIT_DIR/objects` directory, and then either making a copy or hardlink of each file. The only exception to this rule is when copying the "alternates" file, in which case paths are rewritten to be absolute before writing a new "alternates" file in the destination repo. One quirk of this implementation is that it dereferences symlinks when cloning. This behavior was most recently modified in 36596fd2df (clone: better handle symlinked files at .git/objects/, 2019-07-10), which attempted to support `--local` clones of repositories with symlinks in their objects directory in a platform-independent way. Unfortunately, this behavior of dereferencing symlinks (that is, creating a hardlink or copy of the source's link target in the destination repository) can be used as a component in attacking a victim by inadvertently exposing the contents of file stored outside of the repository. Take, for example, a repository that stores a Dockerfile and is used to build Docker images. When building an image, Docker copies the directory contents into the VM, and then instructs the VM to execute the Dockerfile at the root of the copied directory. This protects against directory traversal attacks by copying symbolic links as-is without dereferencing them. That is, if a user has a symlink pointing at their private key material (where the symlink is present in the same directory as the Dockerfile, but the key itself is present outside of that directory), the key is unreadable to a Docker image, since the link will appear broken from the container's point of view. This behavior enables an attack whereby a victim is convinced to clone a repository containing an embedded submodule (with a URL like "file:///proc/self/cwd/path/to/submodule") which has a symlink pointing at a path containing sensitive information on the victim's machine. If a user is tricked into doing this, the contents at the destination of those symbolic links are exposed to the Docker image at runtime. One approach to preventing this behavior is to recreate symlinks in the destination repository. But this is problematic, since symlinking the objects directory are not well-supported. (One potential problem is that when sharing, e.g. a "pack" directory via symlinks, different writers performing garbage collection may consider different sets of objects to be reachable, enabling a situation whereby garbage collecting one repository may remove reachable objects in another repository). Instead, prohibit the local clone optimization when any symlinks are present in the `$GIT_DIR/objects` directory of the source repository. Users may clone the repository again by prepending the "file://" scheme to their clone URL, or by adding the `--no-local` option to their `git clone` invocation. The directory iterator used by `copy_or_link_directory()` must no longer dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()` in order to discover whether or not there are symlinks present). This has no bearing on the overall behavior, since we will immediately `die()` on encounter a symlink. Note that t5604.33 suggests that we do support local clones with symbolic links in the source repository's objects directory, but this was likely unintentional, or at least did not take into consideration the problem with sharing parts of the objects directory with symbolic links at the time. Update this test to reflect which options are and aren't supported. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-07-28 21:35:17 +00:00
# None of these options work when cloning locally, since T has
# symlinks in its `$GIT_DIR/objects` directory
for option in --local --no-hardlinks --dissociate
do
builtin/clone.c: disallow `--local` clones with symlinks When cloning a repository with `--local`, Git relies on either making a hardlink or copy to every file in the "objects" directory of the source repository. This is done through the callpath `cmd_clone()` -> `clone_local()` -> `copy_or_link_directory()`. The way this optimization works is by enumerating every file and directory recursively in the source repository's `$GIT_DIR/objects` directory, and then either making a copy or hardlink of each file. The only exception to this rule is when copying the "alternates" file, in which case paths are rewritten to be absolute before writing a new "alternates" file in the destination repo. One quirk of this implementation is that it dereferences symlinks when cloning. This behavior was most recently modified in 36596fd2df (clone: better handle symlinked files at .git/objects/, 2019-07-10), which attempted to support `--local` clones of repositories with symlinks in their objects directory in a platform-independent way. Unfortunately, this behavior of dereferencing symlinks (that is, creating a hardlink or copy of the source's link target in the destination repository) can be used as a component in attacking a victim by inadvertently exposing the contents of file stored outside of the repository. Take, for example, a repository that stores a Dockerfile and is used to build Docker images. When building an image, Docker copies the directory contents into the VM, and then instructs the VM to execute the Dockerfile at the root of the copied directory. This protects against directory traversal attacks by copying symbolic links as-is without dereferencing them. That is, if a user has a symlink pointing at their private key material (where the symlink is present in the same directory as the Dockerfile, but the key itself is present outside of that directory), the key is unreadable to a Docker image, since the link will appear broken from the container's point of view. This behavior enables an attack whereby a victim is convinced to clone a repository containing an embedded submodule (with a URL like "file:///proc/self/cwd/path/to/submodule") which has a symlink pointing at a path containing sensitive information on the victim's machine. If a user is tricked into doing this, the contents at the destination of those symbolic links are exposed to the Docker image at runtime. One approach to preventing this behavior is to recreate symlinks in the destination repository. But this is problematic, since symlinking the objects directory are not well-supported. (One potential problem is that when sharing, e.g. a "pack" directory via symlinks, different writers performing garbage collection may consider different sets of objects to be reachable, enabling a situation whereby garbage collecting one repository may remove reachable objects in another repository). Instead, prohibit the local clone optimization when any symlinks are present in the `$GIT_DIR/objects` directory of the source repository. Users may clone the repository again by prepending the "file://" scheme to their clone URL, or by adding the `--no-local` option to their `git clone` invocation. The directory iterator used by `copy_or_link_directory()` must no longer dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()` in order to discover whether or not there are symlinks present). This has no bearing on the overall behavior, since we will immediately `die()` on encounter a symlink. Note that t5604.33 suggests that we do support local clones with symbolic links in the source repository's objects directory, but this was likely unintentional, or at least did not take into consideration the problem with sharing parts of the objects directory with symbolic links at the time. Update this test to reflect which options are and aren't supported. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-07-28 21:35:17 +00:00
test_must_fail git clone $option T T$option 2>err || return 1 &&
test_i18ngrep "symlink.*exists" err || return 1
done &&
builtin/clone.c: disallow `--local` clones with symlinks When cloning a repository with `--local`, Git relies on either making a hardlink or copy to every file in the "objects" directory of the source repository. This is done through the callpath `cmd_clone()` -> `clone_local()` -> `copy_or_link_directory()`. The way this optimization works is by enumerating every file and directory recursively in the source repository's `$GIT_DIR/objects` directory, and then either making a copy or hardlink of each file. The only exception to this rule is when copying the "alternates" file, in which case paths are rewritten to be absolute before writing a new "alternates" file in the destination repo. One quirk of this implementation is that it dereferences symlinks when cloning. This behavior was most recently modified in 36596fd2df (clone: better handle symlinked files at .git/objects/, 2019-07-10), which attempted to support `--local` clones of repositories with symlinks in their objects directory in a platform-independent way. Unfortunately, this behavior of dereferencing symlinks (that is, creating a hardlink or copy of the source's link target in the destination repository) can be used as a component in attacking a victim by inadvertently exposing the contents of file stored outside of the repository. Take, for example, a repository that stores a Dockerfile and is used to build Docker images. When building an image, Docker copies the directory contents into the VM, and then instructs the VM to execute the Dockerfile at the root of the copied directory. This protects against directory traversal attacks by copying symbolic links as-is without dereferencing them. That is, if a user has a symlink pointing at their private key material (where the symlink is present in the same directory as the Dockerfile, but the key itself is present outside of that directory), the key is unreadable to a Docker image, since the link will appear broken from the container's point of view. This behavior enables an attack whereby a victim is convinced to clone a repository containing an embedded submodule (with a URL like "file:///proc/self/cwd/path/to/submodule") which has a symlink pointing at a path containing sensitive information on the victim's machine. If a user is tricked into doing this, the contents at the destination of those symbolic links are exposed to the Docker image at runtime. One approach to preventing this behavior is to recreate symlinks in the destination repository. But this is problematic, since symlinking the objects directory are not well-supported. (One potential problem is that when sharing, e.g. a "pack" directory via symlinks, different writers performing garbage collection may consider different sets of objects to be reachable, enabling a situation whereby garbage collecting one repository may remove reachable objects in another repository). Instead, prohibit the local clone optimization when any symlinks are present in the `$GIT_DIR/objects` directory of the source repository. Users may clone the repository again by prepending the "file://" scheme to their clone URL, or by adding the `--no-local` option to their `git clone` invocation. The directory iterator used by `copy_or_link_directory()` must no longer dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()` in order to discover whether or not there are symlinks present). This has no bearing on the overall behavior, since we will immediately `die()` on encounter a symlink. Note that t5604.33 suggests that we do support local clones with symbolic links in the source repository's objects directory, but this was likely unintentional, or at least did not take into consideration the problem with sharing parts of the objects directory with symbolic links at the time. Update this test to reflect which options are and aren't supported. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-07-28 21:35:17 +00:00
# But `--shared` clones should still work, even when specifying
# a local path *and* that repository has symlinks present in its
# `$GIT_DIR/objects` directory.
git clone --shared T T--shared &&
git -C T--shared fsck &&
git -C T--shared rev-list --all --objects >T--shared.objects &&
test_cmp T.objects T--shared.objects &&
(
cd T--shared/.git/objects &&
find . -type f | sort >../../../T--shared.objects-files.raw &&
find . -type l | sort >../../../T--shared.objects-symlinks.raw
) &&
for raw in $(ls T*.raw)
do
sed -e "s!/../!/Y/!; s![0-9a-f]\{38,\}!Z!" -e "/commit-graph/d" \
-e "/multi-pack-index/d" -e "/rev/d" <$raw >$raw.de-sha-1 &&
sort $raw.de-sha-1 >$raw.de-sha || return 1
done &&
echo ./info/alternates >expected-files &&
test_cmp expected-files T--shared.objects-files.raw &&
test_must_be_empty T--shared.objects-symlinks.raw
'
test_expect_success SYMLINKS 'clone repo with symlinked objects directory' '
test_when_finished "rm -fr sensitive malicious" &&
mkdir -p sensitive &&
echo "secret" >sensitive/file &&
git init malicious &&
rm -fr malicious/.git/objects &&
ln -s "$(pwd)/sensitive" ./malicious/.git/objects &&
test_must_fail git clone --local malicious clone 2>err &&
test_path_is_missing clone &&
grep "is a symlink, refusing to clone with --local" err
'
test_done