fast-import: fix over-allocation of marks storage
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.
This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:
- we added a call in option_rewrite_submodules() which uses a separate
mark set; pushing down on "marks" is outright wrong here. We'd
corrupt the "marks" set, and we'd fail to correctly store any
submodule mappings with an id over 1024.
- the other callers passed "marks", but the push-down was still wrong.
In read_mark_file(), we take the pointer to the mark_set as a
parameter. So even though insert_mark() was updating the global
"marks", the local pointer we had in read_mark_file() was not
updated. As a result, we'd add a new level when needed, but then the
next call to insert_mark() wouldn't see it! It would then allocate a
new layer, which would also not be seen, and so on. Lookups for the
lost layers obviously wouldn't work, but before we even hit any
lookup stage, we'd generally run out of memory and die.
Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).
We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.
Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-15 15:38:49 +00:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='test exotic situations with marks'
|
2024-08-14 06:52:23 +00:00
|
|
|
|
|
|
|
TEST_PASSES_SANITIZE_LEAK=true
|
fast-import: fix over-allocation of marks storage
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.
This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:
- we added a call in option_rewrite_submodules() which uses a separate
mark set; pushing down on "marks" is outright wrong here. We'd
corrupt the "marks" set, and we'd fail to correctly store any
submodule mappings with an id over 1024.
- the other callers passed "marks", but the push-down was still wrong.
In read_mark_file(), we take the pointer to the mark_set as a
parameter. So even though insert_mark() was updating the global
"marks", the local pointer we had in read_mark_file() was not
updated. As a result, we'd add a new level when needed, but then the
next call to insert_mark() wouldn't see it! It would then allocate a
new layer, which would also not be seen, and so on. Lookups for the
lost layers obviously wouldn't work, but before we even hit any
lookup stage, we'd generally run out of memory and die.
Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).
We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.
Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-15 15:38:49 +00:00
|
|
|
. ./test-lib.sh
|
|
|
|
|
|
|
|
test_expect_success 'setup dump of basic history' '
|
|
|
|
test_commit one &&
|
|
|
|
git fast-export --export-marks=marks HEAD >dump
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'setup large marks file' '
|
|
|
|
# normally a marks file would have a lot of useful, unique
|
|
|
|
# marks. But for our purposes, just having a lot of nonsense
|
|
|
|
# ones is fine. Start at 1024 to avoid clashing with marks
|
|
|
|
# legitimately used in our tiny dump.
|
|
|
|
blob=$(git rev-parse HEAD:one.t) &&
|
|
|
|
for i in $(test_seq 1024 16384)
|
|
|
|
do
|
2021-12-09 05:11:15 +00:00
|
|
|
echo ":$i $blob" || return 1
|
fast-import: fix over-allocation of marks storage
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.
This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:
- we added a call in option_rewrite_submodules() which uses a separate
mark set; pushing down on "marks" is outright wrong here. We'd
corrupt the "marks" set, and we'd fail to correctly store any
submodule mappings with an id over 1024.
- the other callers passed "marks", but the push-down was still wrong.
In read_mark_file(), we take the pointer to the mark_set as a
parameter. So even though insert_mark() was updating the global
"marks", the local pointer we had in read_mark_file() was not
updated. As a result, we'd add a new level when needed, but then the
next call to insert_mark() wouldn't see it! It would then allocate a
new layer, which would also not be seen, and so on. Lookups for the
lost layers obviously wouldn't work, but before we even hit any
lookup stage, we'd generally run out of memory and die.
Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).
We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.
Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-15 15:38:49 +00:00
|
|
|
done >>marks
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'import with large marks file' '
|
|
|
|
git fast-import --import-marks=marks <dump
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'setup dump with submodule' '
|
2022-07-29 19:21:53 +00:00
|
|
|
test_config_global protocol.file.allow always &&
|
fast-import: fix over-allocation of marks storage
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.
This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:
- we added a call in option_rewrite_submodules() which uses a separate
mark set; pushing down on "marks" is outright wrong here. We'd
corrupt the "marks" set, and we'd fail to correctly store any
submodule mappings with an id over 1024.
- the other callers passed "marks", but the push-down was still wrong.
In read_mark_file(), we take the pointer to the mark_set as a
parameter. So even though insert_mark() was updating the global
"marks", the local pointer we had in read_mark_file() was not
updated. As a result, we'd add a new level when needed, but then the
next call to insert_mark() wouldn't see it! It would then allocate a
new layer, which would also not be seen, and so on. Lookups for the
lost layers obviously wouldn't work, but before we even hit any
lookup stage, we'd generally run out of memory and die.
Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).
We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.
Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-15 15:38:49 +00:00
|
|
|
git submodule add "$PWD" sub &&
|
|
|
|
git commit -m "add submodule" &&
|
|
|
|
git fast-export HEAD >dump
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'setup submodule mapping with large id' '
|
|
|
|
old=$(git rev-parse HEAD:sub) &&
|
|
|
|
new=$(echo $old | sed s/./a/g) &&
|
|
|
|
echo ":12345 $old" >from &&
|
|
|
|
echo ":12345 $new" >to
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'import with submodule mapping' '
|
|
|
|
git init dst &&
|
|
|
|
git -C dst fast-import \
|
|
|
|
--rewrite-submodules-from=sub:../from \
|
|
|
|
--rewrite-submodules-to=sub:../to \
|
|
|
|
<dump &&
|
|
|
|
git -C dst rev-parse HEAD:sub >actual &&
|
|
|
|
echo "$new" >expect &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
2023-03-28 20:54:13 +00:00
|
|
|
test_expect_success 'paths adjusted for relative subdir' '
|
|
|
|
git init deep-dst &&
|
|
|
|
mkdir deep-dst/subdir &&
|
|
|
|
>deep-dst/subdir/empty-marks &&
|
|
|
|
git -C deep-dst/subdir fast-import \
|
|
|
|
--rewrite-submodules-from=sub:../../from \
|
|
|
|
--rewrite-submodules-to=sub:../../to \
|
|
|
|
--import-marks=empty-marks \
|
|
|
|
--export-marks=exported-marks \
|
|
|
|
--export-pack-edges=exported-edges \
|
|
|
|
<dump &&
|
|
|
|
# we do not bother checking resulting repo; we just care that nothing
|
|
|
|
# complained about failing to open files for reading, and that files
|
|
|
|
# for writing were created in the expected spot
|
|
|
|
test_path_is_file deep-dst/subdir/exported-marks &&
|
|
|
|
test_path_is_file deep-dst/subdir/exported-edges
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'relative marks are not affected by subdir' '
|
|
|
|
git init deep-relative &&
|
|
|
|
mkdir deep-relative/subdir &&
|
|
|
|
git -C deep-relative/subdir fast-import \
|
|
|
|
--relative-marks \
|
|
|
|
--export-marks=exported-marks \
|
|
|
|
<dump &&
|
|
|
|
test_path_is_missing deep-relative/subdir/exported-marks &&
|
|
|
|
test_path_is_file deep-relative/.git/info/fast-import/exported-marks
|
|
|
|
'
|
|
|
|
|
fast-import: fix over-allocation of marks storage
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.
This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:
- we added a call in option_rewrite_submodules() which uses a separate
mark set; pushing down on "marks" is outright wrong here. We'd
corrupt the "marks" set, and we'd fail to correctly store any
submodule mappings with an id over 1024.
- the other callers passed "marks", but the push-down was still wrong.
In read_mark_file(), we take the pointer to the mark_set as a
parameter. So even though insert_mark() was updating the global
"marks", the local pointer we had in read_mark_file() was not
updated. As a result, we'd add a new level when needed, but then the
next call to insert_mark() wouldn't see it! It would then allocate a
new layer, which would also not be seen, and so on. Lookups for the
lost layers obviously wouldn't work, but before we even hit any
lookup stage, we'd generally run out of memory and die.
Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).
We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.
Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-15 15:38:49 +00:00
|
|
|
test_done
|