git/builtin
Taylor Blau 4090511e40 builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
When using a multi-pack bitmap, pack-objects will try to perform its
traversal using a call to `traverse_bitmap_commit_list()`, which calls
`add_object_entry_from_bitmap()` to add each object it finds to its
packing list.

This path can cause pack-objects to add objects from packs that don't
have open pack_fds on them, by avoiding a call to `is_pack_valid()`.
This is because we only call `is_pack_valid()` on the preferred pack (in
order to do verbatim reuse via `reuse_partial_packfile_from_bitmap()`)
and not others when loading a MIDX bitmap.

In this case, `add_object_entry_from_bitmap()` will check whether it
wants each object entry by calling `want_object_in_pack()`, which will
call `want_found_object` (since its caller already supplied a
`found_pack`). In most cases (particularly without `--local`, and when
`ignored_packed_keep_on_disk` and `ignored_packed_keep_in_core` are
both "0"), we'll take the entry from the pack contained in the MIDX
bitmap, all without an open pack_fd.

When we then try to use that entry later to assemble the actual pack,
we'll be susceptible to any simultaneous writers moving that pack out of
the way (e.g., due to a concurrent repack) without having an open file
descriptor, causing races that result in errors like:

    remote: Enumerating objects: 1498802, done.
    remote: fatal: packfile ./objects/pack/pack-e57d433b5a588daa37fbe946e2b28dfaec03a93e.pack cannot be accessed
    remote: aborting due to possible repository corruption on the remote side.

This race can happen even with multi-pack bitmaps, since we may open a
MIDX bitmap that is being rewritten long before its packs are actually
unlinked.

Work around this by calling `is_pack_valid()` from within
`want_found_object()`, matching the behavior in
`want_object_in_pack_one()` (which has an analogous call). Most calls to
`is_pack_valid()` should be basically no-ops, since only the first call
requires us to open a file (subsequent calls realize the file is already
open, and return immediately).

Importantly, when `want_object_in_pack()` is given a non-NULL
`*found_pack`, but `want_found_object()` rejects the copy of the object
in that pack, we must reset `*found_pack` and `*found_offset` to NULL
and 0, respectively. Failing to do so could lead to other checks in
`want_object_in_pack()` (such as `want_object_in_pack_one()`) using the
same (invalid) pack as `*found_pack`, meaning that we don't call
`is_pack_valid()` because `p == *found_pack`. This can lead the caller
to believe it can use a copy of an object from an invalid pack.

An alternative approach to closing this race would have been to call
`is_pack_valid()` on _all_ packs in a multi-pack bitmap on load. This
has a couple of problems:

  - it is unnecessarily expensive in the cases where we don't actually
    need to open any packs (e.g., in `git rev-list --use-bitmap-index
    --count`)

  - more importantly, it means any time we would have hit this race,
    we'll avoid using bitmaps altogether, leading to significant
    slowdowns by forcing a full object traversal

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-24 14:27:20 -07:00
..
add.c i18n: turn even more messages into "cannot be used together" ones 2022-01-05 13:31:00 -08:00
am.c Merge branch 'ja/i18n-similar-messages' 2022-01-10 11:52:56 -08:00
annotate.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
apply.c
archive.c use xopen() to handle fatal open(2) failures 2021-08-25 14:39:08 -07:00
bisect--helper.c bisect--helper: add space between colon and following sentence 2021-10-01 15:47:53 -07:00
blame.c Merge branch 'ld/sparse-diff-blame' 2021-12-21 15:03:17 -08:00
branch.c Merge branch 'js/branch-track-inherit' 2022-01-20 15:25:38 -08:00
bugreport.c hook-list.h: add a generated list of hooks, like config-list.h 2021-09-27 09:44:54 -07:00
bundle.c Merge branch 'ab/bundle-remove-verbose-option' 2021-10-03 21:49:20 -07:00
cat-file.c i18n: turn even more messages into "cannot be used together" ones 2022-01-05 13:31:00 -08:00
check-attr.c
check-ignore.c dir.[ch]: replace dir_init() with DIR_INIT 2021-07-01 12:32:22 -07:00
check-mailmap.c shortlog: remove unused(?) "repo-abbrev" feature 2021-01-12 14:04:42 -08:00
check-ref-format.c
checkout--worker.c pkt-line.[ch]: remove unused packet_read_line_buf() 2021-10-15 13:09:40 -07:00
checkout-index.c Merge branch 'mt/parallel-checkout-part-3' 2021-05-16 21:05:23 +09:00
checkout.c Merge branch 'ab/checkout-branch-info-leakfix' 2022-01-24 09:14:46 -08:00
clean.c clean: do not attempt to remove startup_info->original_cwd 2021-12-09 13:33:13 -08:00
clone.c Merge branch 'ps/lockfile-cleanup-fix' 2022-01-12 15:11:43 -08:00
column.c column: fix parsing of the '--nl' option 2021-08-26 14:36:27 -07:00
commit-graph.c Merge branch 'ab/ignore-replace-while-working-on-commit-graph' 2021-11-01 13:48:08 -07:00
commit-tree.c use xopen() to handle fatal open(2) failures 2021-08-25 14:39:08 -07:00
commit.c i18n: turn even more messages into "cannot be used together" ones 2022-01-05 13:31:00 -08:00
config.c urlmatch.[ch]: add and use URLMATCH_CONFIG_INIT 2021-10-01 14:22:51 -07:00
count-objects.c
credential-cache--daemon.c unix-socket: add backlog size option to unix_stream_listen() 2021-03-15 14:32:51 -07:00
credential-cache.c credential-cache: check for windows specific errors 2021-09-14 09:30:54 -07:00
credential-store.c Use a better name for the function interpolating paths 2021-07-26 12:17:16 -07:00
credential.c doc: fix git credential synopsis 2021-10-28 09:57:09 -07:00
describe.c i18n: turn even more messages into "cannot be used together" ones 2022-01-05 13:31:00 -08:00
diff-files.c Merge branch 'jc/diffcore-rotate' 2021-02-25 16:43:30 -08:00
diff-index.c diff-index: restore -c/--cc options handling 2021-09-07 11:11:35 -07:00
diff-tree.c i18n: refactor "foo and bar are mutually exclusive" 2022-01-05 13:29:23 -08:00
diff.c diff: enable and test the sparse index 2021-12-06 09:55:06 -08:00
difftool.c i18n: turn "options are incompatible" into "cannot be used together" 2022-01-05 13:29:23 -08:00
env--helper.c assert PARSE_OPT_NONEG in parse-options callbacks 2020-09-30 12:53:47 -07:00
fast-export.c Merge branch 'ja/i18n-similar-messages' 2022-01-10 11:52:56 -08:00
fast-import.c usage.c API users: use die_message() for "fatal :" + exit 128 2021-12-07 13:25:15 -08:00
fetch-pack.c connect, transport: encapsulate arg in struct 2021-02-05 13:49:54 -08:00
fetch.c Merge branch 'ps/lockfile-cleanup-fix' 2022-01-12 15:11:43 -08:00
fmt-merge-msg.c merge: allow to pretend a merge is made into a different branch 2021-12-20 14:55:02 -08:00
for-each-ref.c for-each-ref: delay parsing of --sort=<atom> options 2021-10-20 14:33:07 -07:00
for-each-repo.c builtin/for-each-repo: remove unnecessary argv copy to plug leak 2021-07-26 12:19:20 -07:00
fsck.c run-command API users: use strvec_pushl(), not argv construction 2021-11-25 22:15:07 -08:00
gc.c usage.c + gc: add and use a die_message_errno() 2021-12-07 13:25:16 -08:00
get-tar-commit-id.c
grep.c grep: fix a "path_list" memory leak 2021-10-23 10:45:25 -07:00
hash-object.c Merge branch 'jc/prefix-filename-allocates' into maint 2021-10-12 13:51:32 -07:00
help.c run-command API users: use strvec_pushl(), not argv construction 2021-11-25 22:15:07 -08:00
index-pack.c i18n: factorize "--foo requires --bar" and the like 2022-01-05 13:31:00 -08:00
init-db.c i18n: refactor "foo and bar are mutually exclusive" 2022-01-05 13:29:23 -08:00
interpret-trailers.c
log.c Merge branch 'ja/i18n-similar-messages' 2022-01-10 11:52:56 -08:00
ls-files.c Merge branch 'ja/i18n-similar-messages' 2022-01-10 11:52:56 -08:00
ls-remote.c for-each-ref: delay parsing of --sort=<atom> options 2021-10-20 14:33:07 -07:00
ls-tree.c tree.h API: simplify read_tree_recursive() signature 2021-03-20 16:09:26 -07:00
mailinfo.c mailinfo: allow squelching quoted CRLF warning 2021-05-10 15:06:22 +09:00
mailsplit.c use xopen() to handle fatal open(2) failures 2021-08-25 14:39:08 -07:00
merge-base.c
merge-file.c xdiff: implement a zealous diff3, or "zdiff3" 2021-12-01 14:45:58 -08:00
merge-index.c merge-index: ensure full index 2021-04-14 13:47:21 -07:00
merge-ours.c builtins + test helpers: use return instead of exit() in cmd_* 2021-06-09 09:15:58 +09:00
merge-recursive.c
merge-tree.c xdiff users: use designated initializers for out_line 2021-05-11 12:47:31 +09:00
merge.c Merge branch 'ja/i18n-similar-messages' 2022-01-10 11:52:56 -08:00
mktag.c fsck: report invalid object type-path combinations 2021-10-01 15:06:01 -07:00
mktree.c builtins + test helpers: use return instead of exit() in cmd_* 2021-06-09 09:15:58 +09:00
multi-pack-index.c builtin/multi-pack-index.c: don't leak concatenated options 2021-10-28 15:32:14 -07:00
mv.c mv: refuse to move sparse paths 2021-09-28 10:31:02 -07:00
name-rev.c name-rev: prefer shorter names over following merges 2021-12-04 23:39:34 -08:00
notes.c Merge branch 'ab/usage-die-message' 2022-01-10 11:52:53 -08:00
pack-objects.c builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects 2022-05-24 14:27:20 -07:00
pack-redundant.c builtin/pack-redundant: avoid casting buffers to struct object_id 2021-04-27 16:31:38 +09:00
pack-refs.c
patch-id.c
prune-packed.c
prune.c Merge branch 'ns/tmp-objdir' 2022-01-03 16:24:15 -08:00
pull.c fetch/pull: use the sparse index 2021-12-22 11:42:39 -08:00
push.c i18n: turn "options are incompatible" into "cannot be used together" 2022-01-05 13:29:23 -08:00
range-diff.c column, range-diff: downcase option description 2021-03-29 14:06:08 -07:00
read-tree.c Change unpack_trees' 'reset' flag into an enum 2021-09-27 13:38:37 -07:00
rebase.c i18n: turn even more messages into "cannot be used together" ones 2022-01-05 13:31:00 -08:00
receive-pack.c Merge branch 'ns/tmp-objdir' 2022-01-03 16:24:15 -08:00
reflog.c reflog + refs-backend: move "verbose" out of the backend 2021-12-22 16:24:14 -08:00
remote-ext.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
remote-fd.c
remote.c Merge branch 'ab/designated-initializers-more' 2021-10-18 15:47:57 -07:00
repack.c Merge branch 'ja/i18n-similar-messages' 2022-01-10 11:52:56 -08:00
replace.c run-command API users: use strvec_pushl(), not argv construction 2021-11-25 22:15:07 -08:00
rerere.c xdiff users: use designated initializers for out_line 2021-05-11 12:47:31 +09:00
reset.c i18n: turn even more messages into "cannot be used together" ones 2022-01-05 13:31:00 -08:00
rev-list.c i18n: turn even more messages into "cannot be used together" ones 2022-01-05 13:31:00 -08:00
rev-parse.c refs: drop "broken" flag from for_each_fullref_in() 2021-09-27 12:36:45 -07:00
revert.c Merge branch 'ds/mergies-with-sparse-index' 2021-09-20 15:20:45 -07:00
rm.c Merge branch 'ja/i18n-similar-messages' 2022-01-10 11:52:56 -08:00
send-pack.c Merge branch 'jk/http-push-status-fix' 2021-10-29 15:43:12 -07:00
shortlog.c parse-options.[ch]: consistently use "enum parse_opt_result" 2021-10-08 14:13:11 -07:00
show-branch.c i18n: turn "options are incompatible" into "cannot be used together" 2022-01-05 13:29:23 -08:00
show-index.c builtin/show-index: set the algorithm for object IDs 2021-04-27 16:31:39 +09:00
show-ref.c refs: switch peel_ref() to peel_iterated_oid() 2021-01-21 15:51:31 -08:00
sparse-checkout.c Merge branch 'ds/sparse-checkout-malformed-pattern-fix' 2022-01-10 11:52:49 -08:00
stash.c Merge branch 'en/keep-cwd' into maint 2022-01-28 16:45:52 -08:00
stripspace.c
submodule--helper.c i18n: refactor "foo and bar are mutually exclusive" 2022-01-05 13:29:23 -08:00
symbolic-ref.c symbolic-ref: don't leak shortened refname in check_symref() 2021-03-14 15:57:59 -07:00
tag.c i18n: tag.c factorize i18n strings 2022-01-05 13:31:00 -08:00
unpack-file.c
unpack-objects.c hash: provide per-algorithm null OIDs 2021-04-27 16:31:39 +09:00
update-index.c use xopen() to handle fatal open(2) failures 2021-08-25 14:39:08 -07:00
update-ref.c update-ref: fix streaming of status updates 2021-09-03 11:35:15 -07:00
update-server-info.c
upload-archive.c upload-archive: use regular "struct child_process" pattern 2021-11-25 22:15:07 -08:00
upload-pack.c upload-pack: document and rename --advertise-refs 2021-08-05 08:59:37 -07:00
var.c var: add GIT_DEFAULT_BRANCH variable 2021-11-03 13:25:36 -07:00
verify-commit.c
verify-pack.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
verify-tag.c
worktree.c i18n: factorize "--foo requires --bar" and the like 2022-01-05 13:31:00 -08:00
write-tree.c