git/t/t5558-clone-bundle-uri.sh

1256 lines
33 KiB
Bash
Raw Permalink Normal View History

#!/bin/sh
test_description='test fetching bundles with --bundle-uri'
. ./test-lib.sh
bundle-uri: verify oid before writing refs When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak <karthik.188@gmail.com> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-19 04:07:31 +00:00
. "$TEST_DIRECTORY"/lib-bundle.sh
test_expect_success 'fail to clone from non-existent file' '
test_when_finished rm -rf test &&
git clone --bundle-uri="$(pwd)/does-not-exist" . test 2>err &&
grep "failed to download bundle from URI" err
'
test_expect_success 'fail to clone from non-bundle file' '
test_when_finished rm -rf test &&
echo bogus >bogus &&
git clone --bundle-uri="$(pwd)/bogus" . test 2>err &&
grep "is not a bundle" err
'
test_expect_success 'create bundle' '
git init clone-from &&
bundle-uri: verify oid before writing refs When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak <karthik.188@gmail.com> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-19 04:07:31 +00:00
(
cd clone-from &&
git checkout -b topic &&
test_commit A &&
git bundle create A.bundle topic &&
test_commit B &&
git bundle create B.bundle topic &&
# Create a bundle with reference pointing to non-existent object.
commit_a=$(git rev-parse A) &&
commit_b=$(git rev-parse B) &&
sed -e "/^$/q" -e "s/$commit_a /$commit_b /" \
<A.bundle >bad-header.bundle &&
convert_bundle_to_pack \
<A.bundle >>bad-header.bundle &&
tree_b=$(git rev-parse B^{tree}) &&
cat >data <<-EOF &&
tree $tree_b
parent $commit_b
author A U Thor
committer A U Thor
commit: this is a commit with bad emails
EOF
bad_commit=$(git hash-object --literally -t commit -w --stdin <data) &&
git branch bad $bad_commit &&
git bundle create bad-object.bundle bad &&
git update-ref -d refs/heads/bad
bundle-uri: verify oid before writing refs When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak <karthik.188@gmail.com> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-19 04:07:31 +00:00
)
'
test_expect_success 'clone with path bundle' '
git clone --bundle-uri="clone-from/B.bundle" \
clone-from clone-path &&
git -C clone-path rev-parse refs/bundles/topic >actual &&
git -C clone-from rev-parse topic >expect &&
test_cmp expect actual
'
bundle-uri: verify oid before writing refs When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak <karthik.188@gmail.com> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-19 04:07:31 +00:00
test_expect_success 'clone with bundle that has bad header' '
# Write bundle ref fails, but clone can still proceed.
git clone --bundle-uri="clone-from/bad-header.bundle" \
clone-from clone-bad-header 2>err &&
commit_b=$(git -C clone-from rev-parse B) &&
test_grep "trying to write ref '\''refs/bundles/topic'\'' with nonexistent object $commit_b" err &&
git -C clone-bad-header for-each-ref --format="%(refname)" >refs &&
test_grep ! "refs/bundles/" refs
'
test_expect_success 'clone with bundle that has bad object' '
# Unbundle succeeds if no fsckObjects configured.
git clone --bundle-uri="clone-from/bad-object.bundle" \
clone-from clone-bad-object-no-fsck &&
git -C clone-bad-object-no-fsck for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
test_write_lines refs/bundles/bad >expect &&
test_cmp expect actual &&
# Unbundle fails with fsckObjects set true, but clone can still proceed.
git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad-object.bundle" \
clone-from clone-bad-object-fsck 2>err &&
test_grep "missingEmail" err &&
git -C clone-bad-object-fsck for-each-ref --format="%(refname)" >refs &&
test_grep ! "refs/bundles/" refs
bundle-uri: verify oid before writing refs When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak <karthik.188@gmail.com> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-19 04:07:31 +00:00
'
builtin/clone: fix bundle URIs with mismatching object formats We create the reference database in git-clone(1) quite early before connecting to the remote repository. Given that we do not yet know about the object format that the remote repository uses at that point in time the consequence is that the refdb may be initialized with the wrong object format. This is not a problem in the context of the files backend as we do not encode the object format anywhere, and furthermore the only reference that we write between initializing the refdb and learning about the object format is the "HEAD" symref. It will become a problem though once we land the reftable backend, which indeed does require to know about the proper object format at the time of creation. We thus need to rearrange the logic in git-clone(1) so that we only initialize the refdb once we have learned about the actual object format. As a first step, move listing of remote references to happen earlier, which also allow us to set up the hash algorithm of the repository earlier now. While we aim to execute this logic as late as possible until after most of the setup has happened already, detection of the object format and thus later the setup of the reference database must happen before any other logic that may spawn Git commands or otherwise these Git commands may not recognize the repository as such. The first Git step where we expect the repository to be fully initalized is when we fetch bundles via bundle URIs. Funny enough, the comments there also state that "the_repository must match the cloned repo", which is indeed not necessarily the case for the hash algorithm right now. So in practice it is the right thing to detect the remote's object format before downloading bundle URIs anyway, and not doing so causes clones with bundle URIs to fail when the local default object format does not match the remote repository's format. Unfortunately though, this creates a new issue: downloading bundles may take a long time, so if we list refs beforehand they might've grown stale meanwhile. It is not clear how to solve this issue except for a second reference listing though after we have downloaded the bundles, which may be an expensive thing to do. Arguably though, it's preferable to have a staleness issue compared to being unable to clone a repository altogether. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-12 07:00:54 +00:00
test_expect_success 'clone with path bundle and non-default hash' '
test_when_finished "rm -rf clone-path-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
clone-from clone-path-non-default-hash &&
git -C clone-path-non-default-hash rev-parse refs/bundles/topic >actual &&
git -C clone-from rev-parse topic >expect &&
test_cmp expect actual
'
test_expect_success 'clone with file:// bundle' '
git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
clone-from clone-file &&
git -C clone-file rev-parse refs/bundles/topic >actual &&
git -C clone-from rev-parse topic >expect &&
test_cmp expect actual
'
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
# To get interesting tests for bundle lists, we need to construct a
# somewhat-interesting commit history.
#
# ---------------- bundle-4
#
# 4
# / \
# ----|---|------- bundle-3
# | |
# | 3
# | |
# ----|---|------- bundle-2
# | |
# 2 |
# | |
# ----|---|------- bundle-1
# \ /
# 1
# |
# (previous commits)
test_expect_success 'construct incremental bundle list' '
(
cd clone-from &&
git checkout -b base &&
test_commit 1 &&
git checkout -b left &&
test_commit 2 &&
git checkout -b right base &&
test_commit 3 &&
git checkout -b merge left &&
git merge right -m "4" &&
git bundle create bundle-1.bundle base &&
git bundle create bundle-2.bundle base..left &&
git bundle create bundle-3.bundle base..right &&
git bundle create bundle-4.bundle merge --not left right
)
'
test_expect_success 'clone bundle list (file, no heuristic)' '
cat >bundle-list <<-EOF &&
[bundle]
version = 1
mode = all
[bundle "bundle-1"]
uri = file://$(pwd)/clone-from/bundle-1.bundle
[bundle "bundle-2"]
uri = file://$(pwd)/clone-from/bundle-2.bundle
[bundle "bundle-3"]
uri = file://$(pwd)/clone-from/bundle-3.bundle
[bundle "bundle-4"]
uri = file://$(pwd)/clone-from/bundle-4.bundle
EOF
git clone --bundle-uri="file://$(pwd)/bundle-list" \
clone-from clone-list-file 2>err &&
! grep "Repository lacks these prerequisite commits" err &&
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
git -C clone-from for-each-ref --format="%(objectname)" >oids &&
git -C clone-list-file cat-file --batch-check <oids &&
git -C clone-list-file for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
cat >expect <<-\EOF &&
refs/bundles/base
refs/bundles/left
refs/bundles/merge
refs/bundles/right
EOF
test_cmp expect actual
'
test_expect_success 'clone bundle list (file, all mode, some failures)' '
cat >bundle-list <<-EOF &&
[bundle]
version = 1
mode = all
# Does not exist. Should be skipped.
[bundle "bundle-0"]
uri = file://$(pwd)/clone-from/bundle-0.bundle
[bundle "bundle-1"]
uri = file://$(pwd)/clone-from/bundle-1.bundle
[bundle "bundle-2"]
uri = file://$(pwd)/clone-from/bundle-2.bundle
# No bundle-3 means bundle-4 will not apply.
[bundle "bundle-4"]
uri = file://$(pwd)/clone-from/bundle-4.bundle
# Does not exist. Should be skipped.
[bundle "bundle-5"]
uri = file://$(pwd)/clone-from/bundle-5.bundle
EOF
GIT_TRACE2_PERF=1 \
git clone --bundle-uri="file://$(pwd)/bundle-list" \
clone-from clone-all-some 2>err &&
! grep "Repository lacks these prerequisite commits" err &&
! grep "fatal" err &&
grep "warning: failed to download bundle from URI" err &&
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
git -C clone-from for-each-ref --format="%(objectname)" >oids &&
git -C clone-all-some cat-file --batch-check <oids &&
git -C clone-all-some for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
cat >expect <<-\EOF &&
refs/bundles/base
refs/bundles/left
EOF
test_cmp expect actual
'
test_expect_success 'clone bundle list (file, all mode, all failures)' '
cat >bundle-list <<-EOF &&
[bundle]
version = 1
mode = all
# Does not exist. Should be skipped.
[bundle "bundle-0"]
uri = file://$(pwd)/clone-from/bundle-0.bundle
# Does not exist. Should be skipped.
[bundle "bundle-5"]
uri = file://$(pwd)/clone-from/bundle-5.bundle
EOF
git clone --bundle-uri="file://$(pwd)/bundle-list" \
clone-from clone-all-fail 2>err &&
! grep "Repository lacks these prerequisite commits" err &&
! grep "fatal" err &&
grep "warning: failed to download bundle from URI" err &&
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
git -C clone-from for-each-ref --format="%(objectname)" >oids &&
git -C clone-all-fail cat-file --batch-check <oids &&
git -C clone-all-fail for-each-ref --format="%(refname)" >refs &&
! grep "refs/bundles/" refs
'
test_expect_success 'clone bundle list (file, any mode)' '
cat >bundle-list <<-EOF &&
[bundle]
version = 1
mode = any
# Does not exist. Should be skipped.
[bundle "bundle-0"]
uri = file://$(pwd)/clone-from/bundle-0.bundle
[bundle "bundle-1"]
uri = file://$(pwd)/clone-from/bundle-1.bundle
# Does not exist. Should be skipped.
[bundle "bundle-5"]
uri = file://$(pwd)/clone-from/bundle-5.bundle
EOF
git clone --bundle-uri="file://$(pwd)/bundle-list" \
clone-from clone-any-file 2>err &&
! grep "Repository lacks these prerequisite commits" err &&
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
git -C clone-from for-each-ref --format="%(objectname)" >oids &&
git -C clone-any-file cat-file --batch-check <oids &&
git -C clone-any-file for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
cat >expect <<-\EOF &&
refs/bundles/base
EOF
test_cmp expect actual
'
test_expect_success 'clone bundle list (file, any mode, all failures)' '
cat >bundle-list <<-EOF &&
[bundle]
version = 1
mode = any
# Does not exist. Should be skipped.
[bundle "bundle-0"]
uri = $HTTPD_URL/bundle-0.bundle
# Does not exist. Should be skipped.
[bundle "bundle-5"]
uri = $HTTPD_URL/bundle-5.bundle
EOF
git clone --bundle-uri="file://$(pwd)/bundle-list" \
clone-from clone-any-fail 2>err &&
! grep "fatal" err &&
grep "warning: failed to download bundle from URI" err &&
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
git -C clone-from for-each-ref --format="%(objectname)" >oids &&
git -C clone-any-fail cat-file --batch-check <oids &&
git -C clone-any-fail for-each-ref --format="%(refname)" >refs &&
! grep "refs/bundles/" refs
'
bundle-uri: verify oid before writing refs When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not discovered, thus resulting in rather slow clones. This was particularly problematic when employing the "creationTokens" heuristic. To reproduce this issue, consider a repository with a single branch "main" pointing to commit "A". Firstly, create a base bundle with: git bundle create base.bundle main Then, add a new commit "B" on top of "A", and create an incremental bundle for "main": git bundle create incr.bundle A..main Now, generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above should result in a reference "refs/bundles/main" pointing to "B" in the new repository. However, git would still download everything from the server, as if it had fetched nothing locally. So why the "refs/bundles/main" is not discovered? After some digging I found that: 1. Bundles in bundle list are downloaded to local files via `bundle-uri.c:download_bundle_list` or via `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 2. Each bundle is unbundled via `bundle-uri.c:unbundle_from_file`, which is called by `bundle-uri.c:unbundle_all_bundles` or called within `bundle-uri.c:fetch_bundles_by_token` for the "creationToken" heuristic. 3. To get all prerequisites of the bundle, the bundle header is read inside `bundle-uri.c:unbundle_from_file` to by calling `bundle.c:read_bundle_header`. 4. Then it calls `bundle.c:unbundle`, which calls `bundle.c:verify_bundle` to ensure the repository contains all the prerequisites. 5. `bundle.c:verify_bundle` calls `parse_object`, which eventually invokes `packfile.c:prepare_packed_git` or `packfile.c:reprepare_packed_git`, filling `raw_object_store->packed_git` and setting `packed_git_initialized`. 6. If `bundle.c:unbundle` succeeds, it writes refs via `refs.c:refs_update_ref` with `REF_SKIP_OID_VERIFICATION` set. Here bundle refs which can target arbitrary objects are written to the repository. 7. Finally, in `fetch-pack.c:do_fetch_pack_v2`, the functions `fetch-pack.c:mark_complete_and_common_ref` and `fetch-pack.c:mark_tips` are called with `OBJECT_INFO_QUICK` set to find local tips for negotiation. The `OBJECT_INFO_QUICK` flag prevents `packfile.c:reprepare_packed_git` from being called, resulting in failures to parse OIDs that reside only in the latest bundle. In the example above, when unbunding "incr.bundle", "base.pack" is added to `packed_git` due to prerequisites verification. However, "B" cannot be found for negotiation because it exists in "incr.pack", which is not included in `packed_git`. Fix the bug by removing `REF_SKIP_OID_VERIFICATION` flag when writing bundle refs. When `refs.c:refs_update_ref` is called to write the corresponding bundle refs, it triggers `refs.c:ref_transaction_commit`. This, in turn, invokes `refs.c:ref_transaction_prepare`, which calls `transaction_prepare` of the refs storage backend. For files backend, it is `files-backend.c:files_transaction_prepare`, and for reftable backend, it is `reftable-backend.c:reftable_be_transaction_prepare`. Both functions eventually call `object.c:parse_object`, which can invoke `packfile.c:reprepare_packed_git` to refresh `packed_git`. This ensures that bundle refs point to valid objects and that all tips from bundle refs are correctly parsed during subsequent negotiations. A set of negotiation-related tests for cloning with bundle-uri has been included to demonstrate that downloaded bundles are utilized to accelerate fetching. Additionally, another test has been added to show that bundles with incorrect headers, where refs point to non-existent objects, do not result in any bundle refs being created in the repository. Reviewed-by: Karthik Nayak <karthik.188@gmail.com> Reviewed-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-19 04:07:31 +00:00
test_expect_success 'negotiation: bundle with part of wanted commits' '
test_when_finished "rm -f trace*.txt" &&
GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
git clone --no-local --bundle-uri="clone-from/A.bundle" \
clone-from nego-bundle-part &&
git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
test_write_lines refs/bundles/topic >expect &&
test_cmp expect actual &&
# Ensure that refs/bundles/topic are sent as "have".
tip=$(git -C clone-from rev-parse A) &&
test_grep "clone> have $tip" trace-packet.txt
'
test_expect_success 'negotiation: bundle with all wanted commits' '
test_when_finished "rm -f trace*.txt" &&
GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
git clone --no-local --single-branch --branch=topic --no-tags \
--bundle-uri="clone-from/B.bundle" \
clone-from nego-bundle-all &&
git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
test_write_lines refs/bundles/topic >expect &&
test_cmp expect actual &&
# We already have all needed commits so no "want" needed.
test_grep ! "clone> want " trace-packet.txt
'
test_expect_success 'negotiation: bundle list (no heuristic)' '
test_when_finished "rm -f trace*.txt" &&
cat >bundle-list <<-EOF &&
[bundle]
version = 1
mode = all
[bundle "bundle-1"]
uri = file://$(pwd)/clone-from/bundle-1.bundle
[bundle "bundle-2"]
uri = file://$(pwd)/clone-from/bundle-2.bundle
EOF
GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
clone-from nego-bundle-list-no-heuristic &&
git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
cat >expect <<-\EOF &&
refs/bundles/base
refs/bundles/left
EOF
test_cmp expect actual &&
tip=$(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left) &&
test_grep "clone> have $tip" trace-packet.txt
'
test_expect_success 'negotiation: bundle list (creationToken)' '
test_when_finished "rm -f trace*.txt" &&
cat >bundle-list <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = file://$(pwd)/clone-from/bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = file://$(pwd)/clone-from/bundle-2.bundle
creationToken = 2
EOF
GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
clone-from nego-bundle-list-heuristic &&
git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
cat >expect <<-\EOF &&
refs/bundles/base
refs/bundles/left
EOF
test_cmp expect actual &&
tip=$(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left) &&
test_grep "clone> have $tip" trace-packet.txt
'
test_expect_success 'negotiation: bundle list with all wanted commits' '
test_when_finished "rm -f trace*.txt" &&
cat >bundle-list <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = file://$(pwd)/clone-from/bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = file://$(pwd)/clone-from/bundle-2.bundle
creationToken = 2
EOF
GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
git clone --no-local --single-branch --branch=left --no-tags \
--bundle-uri="file://$(pwd)/bundle-list" \
clone-from nego-bundle-list-all &&
git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
cat >expect <<-\EOF &&
refs/bundles/base
refs/bundles/left
EOF
test_cmp expect actual &&
# We already have all needed commits so no "want" needed.
test_grep ! "clone> want " trace-packet.txt
'
#########################################################################
# HTTP tests begin here
. "$TEST_DIRECTORY"/lib-httpd.sh
start_httpd
test_expect_success 'fail to fetch from non-existent HTTP URL' '
test_when_finished rm -rf test &&
git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
grep "failed to download bundle from URI" err
'
test_expect_success 'fail to fetch from non-bundle HTTP URL' '
test_when_finished rm -rf test &&
echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
grep "is not a bundle" err
'
test_expect_success 'clone HTTP bundle' '
cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
git clone --no-local --mirror clone-from \
"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
git clone --bundle-uri="$HTTPD_URL/B.bundle" \
"$HTTPD_URL/smart/fetch.git" clone-http &&
git -C clone-http rev-parse refs/bundles/topic >actual &&
git -C clone-from rev-parse topic >expect &&
test_cmp expect actual &&
test_config -C clone-http log.excludedecoration refs/bundle/
'
builtin/clone: fix bundle URIs with mismatching object formats We create the reference database in git-clone(1) quite early before connecting to the remote repository. Given that we do not yet know about the object format that the remote repository uses at that point in time the consequence is that the refdb may be initialized with the wrong object format. This is not a problem in the context of the files backend as we do not encode the object format anywhere, and furthermore the only reference that we write between initializing the refdb and learning about the object format is the "HEAD" symref. It will become a problem though once we land the reftable backend, which indeed does require to know about the proper object format at the time of creation. We thus need to rearrange the logic in git-clone(1) so that we only initialize the refdb once we have learned about the actual object format. As a first step, move listing of remote references to happen earlier, which also allow us to set up the hash algorithm of the repository earlier now. While we aim to execute this logic as late as possible until after most of the setup has happened already, detection of the object format and thus later the setup of the reference database must happen before any other logic that may spawn Git commands or otherwise these Git commands may not recognize the repository as such. The first Git step where we expect the repository to be fully initalized is when we fetch bundles via bundle URIs. Funny enough, the comments there also state that "the_repository must match the cloned repo", which is indeed not necessarily the case for the hash algorithm right now. So in practice it is the right thing to detect the remote's object format before downloading bundle URIs anyway, and not doing so causes clones with bundle URIs to fail when the local default object format does not match the remote repository's format. Unfortunately though, this creates a new issue: downloading bundles may take a long time, so if we list refs beforehand they might've grown stale meanwhile. It is not clear how to solve this issue except for a second reference listing though after we have downloaded the bundles, which may be an expensive thing to do. Arguably though, it's preferable to have a staleness issue compared to being unable to clone a repository altogether. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-12 07:00:54 +00:00
test_expect_success 'clone HTTP bundle with non-default hash' '
test_when_finished "rm -rf clone-http-non-default-hash" &&
GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="$HTTPD_URL/B.bundle" \
"$HTTPD_URL/smart/fetch.git" clone-http-non-default-hash &&
git -C clone-http-non-default-hash rev-parse refs/bundles/topic >actual &&
git -C clone-from rev-parse topic >expect &&
test_cmp expect actual
'
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
test_expect_success 'clone bundle list (HTTP, no heuristic)' '
test_when_finished rm -f trace*.txt &&
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
[bundle "bundle-1"]
uri = $HTTPD_URL/bundle-1.bundle
[bundle "bundle-2"]
uri = $HTTPD_URL/bundle-2.bundle
[bundle "bundle-3"]
uri = $HTTPD_URL/bundle-3.bundle
[bundle "bundle-4"]
uri = $HTTPD_URL/bundle-4.bundle
EOF
GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
git clone --bundle-uri="$HTTPD_URL/bundle-list" \
clone-from clone-list-http 2>err &&
! grep "Repository lacks these prerequisite commits" err &&
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
git -C clone-from for-each-ref --format="%(objectname)" >oids &&
git -C clone-list-http cat-file --batch-check <oids &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-1.bundle
$HTTPD_URL/bundle-2.bundle
$HTTPD_URL/bundle-3.bundle
$HTTPD_URL/bundle-4.bundle
$HTTPD_URL/bundle-list
EOF
# Sort the list, since the order is not well-defined
# without a heuristic.
test_remote_https_urls <trace-clone.txt | sort >actual &&
test_cmp expect actual
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
'
test_expect_success 'clone bundle list (HTTP, any mode)' '
cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = any
# Does not exist. Should be skipped.
[bundle "bundle-0"]
uri = $HTTPD_URL/bundle-0.bundle
[bundle "bundle-1"]
uri = $HTTPD_URL/bundle-1.bundle
# Does not exist. Should be skipped.
[bundle "bundle-5"]
uri = $HTTPD_URL/bundle-5.bundle
EOF
git clone --bundle-uri="$HTTPD_URL/bundle-list" \
clone-from clone-any-http 2>err &&
! grep "fatal" err &&
grep "warning: failed to download bundle from URI" err &&
bundle-uri: fetch a list of bundles When the content at a given bundle URI is not understood as a bundle (based on inspecting the initial content), then Git currently gives up and ignores that content. Independent bundle providers may want to split up the bundle content into multiple bundles, but still make them available from a single URI. Teach Git to attempt parsing the bundle URI content as a Git config file providing the key=value pairs for a bundle list. Git then looks at the mode of the list to see if ANY single bundle is sufficient or if ALL bundles are required. The content at the selected URIs are downloaded and the content is inspected again, creating a recursive process. To guard the recursion against malformed or malicious content, limit the recursion depth to a reasonable four for now. This can be converted to a configured value in the future if necessary. The value of four is twice as high as expected to be useful (a bundle list is unlikely to point to more bundle lists). To test this scenario, create an interesting bundle topology where three incremental bundles are built on top of a single full bundle. By using a merge commit, the two middle bundles are "independent" in that they do not require each other in order to unbundle themselves. They each only need the base bundle. The bundle containing the merge commit requires both of the middle bundles, though. This leads to some interesting decisions when unbundling, especially when we later implement heuristics that promote downloading bundles until the prerequisite commits are satisfied. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-12 12:52:36 +00:00
git -C clone-from for-each-ref --format="%(objectname)" >oids &&
git -C clone-any-http cat-file --batch-check <oids &&
git -C clone-list-file for-each-ref --format="%(refname)" >refs &&
grep "refs/bundles/" refs >actual &&
cat >expect <<-\EOF &&
refs/bundles/base
refs/bundles/left
refs/bundles/merge
refs/bundles/right
EOF
test_cmp expect actual
'
test_expect_success 'clone bundle list (http, creationToken)' '
test_when_finished rm -f trace*.txt &&
cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
[bundle "bundle-4"]
uri = bundle-4.bundle
creationToken = 4
EOF
GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \
clone --bundle-uri="$HTTPD_URL/bundle-list" \
"$HTTPD_URL/smart/fetch.git" clone-list-http-2 &&
git -C clone-from for-each-ref --format="%(objectname)" >oids &&
git -C clone-list-http-2 cat-file --batch-check <oids &&
cat >expect <<-EOF &&
bundle-uri: download in creationToken order The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. However, we are not satisfied by the naive approach of downloading bundles until one successfully unbundles, expecting the earlier bundles to successfully unbundle now. The example repository in t5558 demonstrates this well: ---------------- bundle-4 4 / \ ----|---|------- bundle-3 | | | 3 | | ----|---|------- bundle-2 | | 2 | | | ----|---|------- bundle-1 \ / 1 | (previous commits) In this repository, if we already have the objects for bundle-1 and then try to fetch from this list, the naive approach will fail. bundle-4 requires both bundle-3 and bundle-2, though bundle-3 will successfully unbundle without bundle-2. Thus, the algorithm needs to keep this in mind. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-4.bundle
bundle-uri: download in creationToken order The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. However, we are not satisfied by the naive approach of downloading bundles until one successfully unbundles, expecting the earlier bundles to successfully unbundle now. The example repository in t5558 demonstrates this well: ---------------- bundle-4 4 / \ ----|---|------- bundle-3 | | | 3 | | ----|---|------- bundle-2 | | 2 | | | ----|---|------- bundle-1 \ / 1 | (previous commits) In this repository, if we already have the objects for bundle-1 and then try to fetch from this list, the naive approach will fail. bundle-4 requires both bundle-3 and bundle-2, though bundle-3 will successfully unbundle without bundle-2. Thus, the algorithm needs to keep this in mind. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
$HTTPD_URL/bundle-3.bundle
$HTTPD_URL/bundle-2.bundle
$HTTPD_URL/bundle-1.bundle
EOF
test_remote_https_urls <trace-clone.txt >actual &&
test_cmp expect actual
'
test_expect_success 'clone incomplete bundle list (http, creationToken)' '
test_when_finished rm -f trace*.txt &&
cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
EOF
GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
git clone --bundle-uri="$HTTPD_URL/bundle-list" \
--single-branch --branch=base --no-tags \
"$HTTPD_URL/smart/fetch.git" clone-token-http &&
test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken &&
bundle-uri: download in creationToken order The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. However, we are not satisfied by the naive approach of downloading bundles until one successfully unbundles, expecting the earlier bundles to successfully unbundle now. The example repository in t5558 demonstrates this well: ---------------- bundle-4 4 / \ ----|---|------- bundle-3 | | | 3 | | ----|---|------- bundle-2 | | 2 | | | ----|---|------- bundle-1 \ / 1 | (previous commits) In this repository, if we already have the objects for bundle-1 and then try to fetch from this list, the naive approach will fail. bundle-4 requires both bundle-3 and bundle-2, though bundle-3 will successfully unbundle without bundle-2. Thus, the algorithm needs to keep this in mind. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
bundle-uri: download in creationToken order The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. However, we are not satisfied by the naive approach of downloading bundles until one successfully unbundles, expecting the earlier bundles to successfully unbundle now. The example repository in t5558 demonstrates this well: ---------------- bundle-4 4 / \ ----|---|------- bundle-3 | | | 3 | | ----|---|------- bundle-2 | | 2 | | | ----|---|------- bundle-1 \ / 1 | (previous commits) In this repository, if we already have the objects for bundle-1 and then try to fetch from this list, the naive approach will fail. bundle-4 requires both bundle-3 and bundle-2, though bundle-3 will successfully unbundle without bundle-2. Thus, the algorithm needs to keep this in mind. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
$HTTPD_URL/bundle-1.bundle
EOF
bundle-uri: download in creationToken order The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. However, we are not satisfied by the naive approach of downloading bundles until one successfully unbundles, expecting the earlier bundles to successfully unbundle now. The example repository in t5558 demonstrates this well: ---------------- bundle-4 4 / \ ----|---|------- bundle-3 | | | 3 | | ----|---|------- bundle-2 | | 2 | | | ----|---|------- bundle-1 \ / 1 | (previous commits) In this repository, if we already have the objects for bundle-1 and then try to fetch from this list, the naive approach will fail. bundle-4 requires both bundle-3 and bundle-2, though bundle-3 will successfully unbundle without bundle-2. Thus, the algorithm needs to keep this in mind. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 13:29:14 +00:00
test_remote_https_urls <trace-clone.txt >actual &&
test_cmp expect actual &&
# We now have only one bundle ref.
git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-\EOF &&
refs/bundles/base
EOF
test_cmp expect refs &&
# Add remaining bundles, exercising the "deepening" strategy
# for downloading via the creationToken heurisitc.
cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
[bundle "bundle-4"]
uri = bundle-4.bundle
creationToken = 4
EOF
GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
git -C clone-token-http fetch origin --no-tags \
refs/heads/merge:refs/heads/merge &&
test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-4.bundle
$HTTPD_URL/bundle-3.bundle
$HTTPD_URL/bundle-2.bundle
EOF
test_remote_https_urls <trace1.txt >actual &&
test_cmp expect actual &&
# We now have all bundle refs.
git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-\EOF &&
refs/bundles/base
refs/bundles/left
refs/bundles/merge
refs/bundles/right
EOF
test_cmp expect refs
'
test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
test_when_finished rm -rf fetch-http-4 trace*.txt &&
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
EOF
GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
git clone --single-branch --branch=base \
--bundle-uri="$HTTPD_URL/bundle-list" \
"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-1.bundle
EOF
test_remote_https_urls <trace-clone.txt >actual &&
test_cmp expect actual &&
# only received base ref from bundle-1
git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-\EOF &&
refs/bundles/base
EOF
test_cmp expect refs &&
cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
EOF
# Fetch the objects for bundle-2 _and_ bundle-3.
GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
git -C fetch-http-4 fetch origin --no-tags \
refs/heads/left:refs/heads/left \
refs/heads/right:refs/heads/right &&
test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-2.bundle
EOF
test_remote_https_urls <trace1.txt >actual &&
test_cmp expect actual &&
# received left from bundle-2
git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-\EOF &&
refs/bundles/base
refs/bundles/left
EOF
test_cmp expect refs &&
# No-op fetch
GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \
git -C fetch-http-4 fetch origin --no-tags \
refs/heads/left:refs/heads/left \
refs/heads/right:refs/heads/right &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
EOF
test_remote_https_urls <trace1b.txt >actual &&
test_cmp expect actual &&
cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
[bundle "bundle-4"]
uri = bundle-4.bundle
creationToken = 4
EOF
# This fetch should skip bundle-3.bundle, since its objects are
# already local (we have the requisite commits for bundle-4.bundle).
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
git -C fetch-http-4 fetch origin --no-tags \
refs/heads/merge:refs/heads/merge &&
test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-4.bundle
EOF
test_remote_https_urls <trace2.txt >actual &&
test_cmp expect actual &&
# received merge ref from bundle-4, but right is missing
# because we did not download bundle-3.
git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-\EOF &&
refs/bundles/base
refs/bundles/left
refs/bundles/merge
EOF
test_cmp expect refs &&
# No-op fetch
GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
git -C fetch-http-4 fetch origin &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
EOF
test_remote_https_urls <trace2b.txt >actual &&
test_cmp expect actual
'
test_expect_success 'creationToken heuristic with failed downloads (clone)' '
test_when_finished rm -rf download-* trace*.txt &&
# Case 1: base bundle does not exist, nothing can unbundle
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = fake.bundle
creationToken = 1
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
[bundle "bundle-4"]
uri = bundle-4.bundle
creationToken = 4
EOF
GIT_TRACE2_EVENT="$(pwd)/trace-clone-1.txt" \
git clone --single-branch --branch=base \
--bundle-uri="$HTTPD_URL/bundle-list" \
"$HTTPD_URL/smart/fetch.git" download-1 &&
# Bundle failure does not set these configs.
test_must_fail git -C download-1 config fetch.bundleuri &&
test_must_fail git -C download-1 config fetch.bundlecreationtoken &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-4.bundle
$HTTPD_URL/bundle-3.bundle
$HTTPD_URL/bundle-2.bundle
$HTTPD_URL/fake.bundle
EOF
test_remote_https_urls <trace-clone-1.txt >actual &&
test_cmp expect actual &&
# All bundles failed to unbundle
git -C download-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
test_must_be_empty refs &&
# Case 2: middle bundle does not exist, only two bundles can unbundle
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = fake.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
[bundle "bundle-4"]
uri = bundle-4.bundle
creationToken = 4
EOF
GIT_TRACE2_EVENT="$(pwd)/trace-clone-2.txt" \
git clone --single-branch --branch=base \
--bundle-uri="$HTTPD_URL/bundle-list" \
"$HTTPD_URL/smart/fetch.git" download-2 &&
# Bundle failure does not set these configs.
test_must_fail git -C download-2 config fetch.bundleuri &&
test_must_fail git -C download-2 config fetch.bundlecreationtoken &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-4.bundle
$HTTPD_URL/bundle-3.bundle
$HTTPD_URL/fake.bundle
$HTTPD_URL/bundle-1.bundle
EOF
test_remote_https_urls <trace-clone-2.txt >actual &&
test_cmp expect actual &&
# bundle-1 and bundle-3 could unbundle, but bundle-4 could not
git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-EOF &&
refs/bundles/base
refs/bundles/right
EOF
test_cmp expect refs &&
# Case 3: top bundle does not exist, rest unbundle fine.
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
[bundle "bundle-4"]
uri = fake.bundle
creationToken = 4
EOF
GIT_TRACE2_EVENT="$(pwd)/trace-clone-3.txt" \
git clone --single-branch --branch=base \
--bundle-uri="$HTTPD_URL/bundle-list" \
"$HTTPD_URL/smart/fetch.git" download-3 &&
# As long as we have continguous successful downloads,
# we _do_ set these configs.
test_cmp_config -C download-3 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
test_cmp_config -C download-3 3 fetch.bundlecreationtoken &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/fake.bundle
$HTTPD_URL/bundle-3.bundle
$HTTPD_URL/bundle-2.bundle
$HTTPD_URL/bundle-1.bundle
EOF
test_remote_https_urls <trace-clone-3.txt >actual &&
test_cmp expect actual &&
# fake.bundle did not unbundle, but the others did.
git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-EOF &&
refs/bundles/base
refs/bundles/left
refs/bundles/right
EOF
test_cmp expect refs
'
# Expand the bundle list to include other interesting shapes, specifically
# interesting for use when fetching from a previous state.
#
# ---------------- bundle-7
# 7
# _/|\_
# ---/--|--\------ bundle-6
# 5 | 6
# --|---|---|----- bundle-4
# | 4 |
# | / \ /
# --|-|---|/------ bundle-3 (the client will be caught up to this point.)
# \ | 3
# ---\|---|------- bundle-2
# 2 |
# ----|---|------- bundle-1
# \ /
# 1
# |
# (previous commits)
test_expect_success 'expand incremental bundle list' '
(
cd clone-from &&
git checkout -b lefter left &&
test_commit 5 &&
git checkout -b righter right &&
test_commit 6 &&
git checkout -b top lefter &&
git merge -m "7" merge righter &&
git bundle create bundle-6.bundle lefter righter --not left right &&
git bundle create bundle-7.bundle top --not lefter merge righter &&
cp bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/"
) &&
git -C "$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" fetch origin +refs/heads/*:refs/heads/*
'
test_expect_success 'creationToken heuristic with failed downloads (fetch)' '
test_when_finished rm -rf download-* trace*.txt &&
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
EOF
git clone --single-branch --branch=left \
--bundle-uri="$HTTPD_URL/bundle-list" \
"$HTTPD_URL/smart/fetch.git" fetch-base &&
test_cmp_config -C fetch-base "$HTTPD_URL/bundle-list" fetch.bundleURI &&
test_cmp_config -C fetch-base 3 fetch.bundleCreationToken &&
# Case 1: all bundles exist: successful unbundling of all bundles
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
[bundle "bundle-4"]
uri = bundle-4.bundle
creationToken = 4
[bundle "bundle-6"]
uri = bundle-6.bundle
creationToken = 6
[bundle "bundle-7"]
uri = bundle-7.bundle
creationToken = 7
EOF
cp -r fetch-base fetch-1 &&
GIT_TRACE2_EVENT="$(pwd)/trace-fetch-1.txt" \
git -C fetch-1 fetch origin &&
test_cmp_config -C fetch-1 7 fetch.bundlecreationtoken &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-7.bundle
$HTTPD_URL/bundle-6.bundle
$HTTPD_URL/bundle-4.bundle
EOF
test_remote_https_urls <trace-fetch-1.txt >actual &&
test_cmp expect actual &&
# Check which bundles have unbundled by refs
git -C fetch-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-EOF &&
refs/bundles/base
refs/bundles/left
refs/bundles/lefter
refs/bundles/merge
refs/bundles/right
refs/bundles/righter
refs/bundles/top
EOF
test_cmp expect refs &&
# Case 2: middle bundle does not exist, only bundle-4 can unbundle
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
[bundle "bundle-4"]
uri = bundle-4.bundle
creationToken = 4
[bundle "bundle-6"]
uri = fake.bundle
creationToken = 6
[bundle "bundle-7"]
uri = bundle-7.bundle
creationToken = 7
EOF
cp -r fetch-base fetch-2 &&
GIT_TRACE2_EVENT="$(pwd)/trace-fetch-2.txt" \
git -C fetch-2 fetch origin &&
# Since bundle-7 fails to unbundle, do not update creation token.
test_cmp_config -C fetch-2 3 fetch.bundlecreationtoken &&
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/bundle-7.bundle
$HTTPD_URL/fake.bundle
$HTTPD_URL/bundle-4.bundle
EOF
test_remote_https_urls <trace-fetch-2.txt >actual &&
test_cmp expect actual &&
# Check which bundles have unbundled by refs
git -C fetch-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-EOF &&
refs/bundles/base
refs/bundles/left
refs/bundles/merge
refs/bundles/right
EOF
test_cmp expect refs &&
# Case 3: top bundle does not exist, rest unbundle fine.
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
[bundle "bundle-4"]
uri = bundle-4.bundle
creationToken = 4
[bundle "bundle-6"]
uri = bundle-6.bundle
creationToken = 6
[bundle "bundle-7"]
uri = fake.bundle
creationToken = 7
EOF
cp -r fetch-base fetch-3 &&
GIT_TRACE2_EVENT="$(pwd)/trace-fetch-3.txt" \
git -C fetch-3 fetch origin &&
# As long as we have continguous successful downloads,
# we _do_ set the maximum creation token.
test_cmp_config -C fetch-3 6 fetch.bundlecreationtoken &&
# NOTE: the fetch skips bundle-4 since bundle-6 successfully
# unbundles itself and bundle-7 failed to download.
cat >expect <<-EOF &&
$HTTPD_URL/bundle-list
$HTTPD_URL/fake.bundle
$HTTPD_URL/bundle-6.bundle
EOF
test_remote_https_urls <trace-fetch-3.txt >actual &&
test_cmp expect actual &&
# Check which bundles have unbundled by refs
git -C fetch-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
cat >expect <<-EOF &&
refs/bundles/base
refs/bundles/left
refs/bundles/lefter
refs/bundles/right
refs/bundles/righter
EOF
test_cmp expect refs
'
test_expect_success 'bundles are downloaded once during fetch --all' '
test_when_finished rm -rf download-* trace*.txt fetch-mult &&
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "bundle-1"]
uri = bundle-1.bundle
creationToken = 1
[bundle "bundle-2"]
uri = bundle-2.bundle
creationToken = 2
[bundle "bundle-3"]
uri = bundle-3.bundle
creationToken = 3
EOF
git clone --single-branch --branch=left \
--bundle-uri="$HTTPD_URL/bundle-list" \
"$HTTPD_URL/smart/fetch.git" fetch-mult &&
git -C fetch-mult remote add dup1 "$HTTPD_URL/smart/fetch.git" &&
git -C fetch-mult remote add dup2 "$HTTPD_URL/smart/fetch.git" &&
GIT_TRACE2_EVENT="$(pwd)/trace-mult.txt" \
git -C fetch-mult fetch --all &&
grep "\"child_start\".*\"git-remote-https\",\"$HTTPD_URL/bundle-list\"" \
trace-mult.txt >bundle-fetches &&
test_line_count = 1 bundle-fetches
'
# Do not add tests here unless they use the HTTP server, as they will
# not run unless the HTTP dependencies exist.
test_done