Commit graph

71670 commits

Author SHA1 Message Date
Junio C Hamano 546f8d2dcd Merge branch 'ps/reftable-fixes' into maint-2.43
Bunch of small fix-ups to the reftable code.

* ps/reftable-fixes:
  reftable/block: reuse buffer to compute record keys
  reftable/block: introduce macro to initialize `struct block_iter`
  reftable/merged: reuse buffer to compute record keys
  reftable/stack: fix use of unseeded randomness
  reftable/stack: fix stale lock when dying
  reftable/stack: reuse buffers when reloading stack
  reftable/stack: perform auto-compaction with transactional interface
  reftable/stack: verify that `reftable_stack_add()` uses auto-compaction
  reftable: handle interrupted writes
  reftable: handle interrupted reads
  reftable: wrap EXPECT macros in do/while
2024-02-08 16:22:07 -08:00
Junio C Hamano b471ea3a0d Merge branch 'jk/config-cleanup' into maint-2.43
Code clean-up around use of configuration variables.

* jk/config-cleanup:
  sequencer: simplify away extra git_config_string() call
  gpg-interface: drop pointless config_error_nonbool() checks
  push: drop confusing configset/callback redundancy
  config: use git_config_string() for core.checkRoundTripEncoding
  diff: give more detailed messages for bogus diff.* config
  config: use config_error_nonbool() instead of custom messages
  imap-send: don't use git_die_config() inside callback
  git_xmerge_config(): prefer error() to die()
  config: reject bogus values for core.checkstat
2024-02-08 16:22:07 -08:00
Junio C Hamano 6479e121c2 Merge branch 'rs/incompatible-options-messages' into maint-2.43
Clean-up code that handles combinations of incompatible options.

* rs/incompatible-options-messages:
  worktree: simplify incompatibility message for --orphan and commit-ish
  worktree: standardize incompatibility messages
  clean: factorize incompatibility message
  revision, rev-parse: factorize incompatibility messages about - -exclude-hidden
  revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs
  repack: use die_for_incompatible_opt3() for -A/-k/--cruft
  push: use die_for_incompatible_opt4() for - -delete/--tags/--all/--mirror
2024-02-08 16:22:06 -08:00
Junio C Hamano 1dbc46997a Merge branch 'mk/doc-gitfile-more' into maint-2.43
Doc update.

* mk/doc-gitfile-more:
  doc: make the gitfile syntax easier to discover
2024-02-08 16:22:06 -08:00
Junio C Hamano 67bb8ff5da Merge branch 'ps/ref-tests-update-more' into maint-2.43
Tests update.

* ps/ref-tests-update-more:
  t6301: write invalid object ID via `test-tool ref-store`
  t5551: stop writing packed-refs directly
  t5401: speed up creation of many branches
  t4013: simplify magic parsing and drop "failure"
  t3310: stop checking for reference existence via `test -f`
  t1417: make `reflog --updateref` tests backend agnostic
  t1410: use test-tool to create empty reflog
  t1401: stop treating FETCH_HEAD as real reference
  t1400: split up generic reflog tests from the reffile-specific ones
  t0410: mark tests to require the reffiles backend
2024-02-08 16:22:06 -08:00
Junio C Hamano a7ea468346 Merge branch 'rs/column-leakfix' into maint-2.43
Leakfix.

* rs/column-leakfix:
  column: release strbuf and string_list after use
2024-02-08 16:22:06 -08:00
Junio C Hamano 25e2039cf6 Merge branch 'rs/i18n-cannot-be-used-together' into maint-2.43
Clean-up code that handles combinations of incompatible options.

* rs/i18n-cannot-be-used-together:
  i18n: factorize even more 'incompatible options' messages
2024-02-08 16:22:05 -08:00
Junio C Hamano 173d7746f6 Merge branch 'jb/reflog-expire-delete-dry-run-options' into maint-2.43
Command line parsing fix for "git reflog".

* jb/reflog-expire-delete-dry-run-options:
  builtin/reflog.c: fix dry-run option short name
2024-02-08 16:22:05 -08:00
Junio C Hamano bcab40f14f Merge branch 'js/packfile-h-typofix' into maint-2.43
Typofix.

* js/packfile-h-typofix:
  packfile.c: fix a typo in `each_file_in_pack_dir_fn()`'s declaration
2024-02-08 16:22:05 -08:00
Junio C Hamano 1685e9ffe6 Merge branch 'jk/commit-graph-slab-clear-fix' into maint-2.43
Clearing in-core repository (happens during e.g., "git fetch
--recurse-submodules" with commit graph enabled) made in-core
commit object in an inconsistent state by discarding the necessary
data from commit-graph too early, which has been corrected.

* jk/commit-graph-slab-clear-fix:
  commit-graph: retain commit slab when closing NULL commit_graph
2024-02-08 16:22:05 -08:00
Junio C Hamano 3c2ee131f8 Merge branch 'cp/git-flush-is-an-env-bool' into maint-2.43
Unlike other environment variables that took the usual
true/false/yes/no as well as 0/1, GIT_FLUSH only understood 0/1,
which has been corrected.

* cp/git-flush-is-an-env-bool:
  write-or-die: make GIT_FLUSH a Boolean environment variable
2024-02-08 16:22:04 -08:00
Junio C Hamano 8566311a03 Merge branch 'jc/sparse-checkout-set-default-fix' into maint-2.43
"git sparse-checkout set" added default patterns even when the
patterns are being fed from the standard input, which has been
corrected.

* jc/sparse-checkout-set-default-fix:
  sparse-checkout: use default patterns for 'set' only !stdin
2024-02-08 16:22:04 -08:00
Junio C Hamano 878f8c42dc Merge branch 'jc/archive-list-with-extra-args' into maint-2.43
"git archive --list extra garbage" silently ignored excess command
line parameters, which has been corrected.

* jc/archive-list-with-extra-args:
  archive: "--list" does not take further options
2024-02-08 16:22:04 -08:00
Junio C Hamano a593e2fbce Merge branch 'rj/status-bisect-while-rebase' into maint-2.43
"git status" is taught to show both the branch being bisected and
being rebased when both are in effect at the same time.
cf. <xmqqil76kyov.fsf@gitster.g>

* rj/status-bisect-while-rebase:
  status: fix branch shown when not only bisecting
2024-02-08 16:22:04 -08:00
Junio C Hamano 8f7cc565e0 Merge branch 'sh/completion-with-reftable' into maint-2.43
Command line completion script (in contrib/) learned to work better
with the reftable backend.

* sh/completion-with-reftable:
  completion: support pseudoref existence checks for reftables
  completion: refactor existence checks for pseudorefs
2024-02-08 16:22:04 -08:00
Junio C Hamano ce54593289 Merge branch 'jx/fetch-atomic-error-message-fix' into maint-2.43
"git fetch --atomic" issued an unnecessary empty error message,
which has been corrected.
cf. <ZX__e7VjyLXIl-uV@tanuki>

* jx/fetch-atomic-error-message-fix:
  fetch: no redundant error message for atomic fetch
  t5574: test porcelain output of atomic fetch
2024-02-08 16:22:03 -08:00
Junio C Hamano 0e92593acf Merge branch 'jk/mailinfo-iterative-unquote-comment' into maint-2.43
The code to parse the From e-mail header has been updated to avoid
recursion.

* jk/mailinfo-iterative-unquote-comment:
  mailinfo: avoid recursion when unquoting From headers
  t5100: make rfc822 comment test more careful
  mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
2024-02-08 16:22:03 -08:00
Junio C Hamano 952916f9e0 Merge branch 'rs/show-ref-incompatible-options' into maint-2.43
Code clean-up for sanity checking of command line options for "git
show-ref".

* rs/show-ref-incompatible-options:
  show-ref: use die_for_incompatible_opt3()
2024-02-08 16:22:03 -08:00
Junio C Hamano 28b47452b3 Merge branch 'jk/implicit-true' into maint-2.43
Some codepaths did not correctly parse configuration variables
specified with valueless "true", which has been corrected.

* jk/implicit-true:
  fsck: handle NULL value when parsing message config
  trailer: handle NULL value when parsing trailer-specific config
  submodule: handle NULL value when parsing submodule.*.branch
  help: handle NULL value for alias.* config
  trace2: handle NULL values in tr2_sysenv config callback
  setup: handle NULL value when parsing extensions
  config: handle NULL value when parsing non-bools
2024-02-08 16:22:03 -08:00
Junio C Hamano 5baedc68b0 Merge branch 'jk/bisect-reset-fix' into maint-2.43
"git bisect reset" has been taught to clean up state files and refs
even when BISECT_START file is gone.

* jk/bisect-reset-fix:
  bisect: always clean on reset
2024-02-08 16:22:03 -08:00
Junio C Hamano 19fa15fb2d Merge branch 'jk/end-of-options' into maint-2.43
"git $cmd --end-of-options --rev -- --path" for some $cmd failed
to interpret "--rev" as a rev, and "--path" as a path.  This was
fixed for many programs like "reset" and "checkout".

* jk/end-of-options:
  parse-options: decouple "--end-of-options" and "--"
2024-02-08 16:22:02 -08:00
Junio C Hamano 4b50f86141 Merge branch 'jc/revision-parse-int' into maint-2.43
The command line parser for the "log" family of commands was too
loose when parsing certain numbers, e.g., silently ignoring the
extra 'q' in "git log -n 1q" without complaining, which has been
tightened up.

* jc/revision-parse-int:
  revision: parse integer arguments to --max-count, --skip, etc., more carefully
2024-02-08 16:22:02 -08:00
Junio C Hamano 7c05241877 Merge branch 'jp/use-diff-index-in-pre-commit-sample' into maint-2.43
The sample pre-commit hook that tries to catch introduction of new
paths that use potentially non-portable characters did not notice
an existing path getting renamed to such a problematic path, when
rename detection was enabled.

* jp/use-diff-index-in-pre-commit-sample:
  hooks--pre-commit: detect non-ASCII when renaming
2024-02-08 16:22:02 -08:00
Junio C Hamano 13031f6689 Merge branch 'jh/trace2-redact-auth' into maint-2.43
trace2 streams used to record the URLs that potentially embed
authentication material, which has been corrected.

* jh/trace2-redact-auth:
  t0212: test URL redacting in EVENT format
  t0211: test URL redacting in PERF format
  trace2: redact passwords from https:// URLs by default
  trace2: fix signature of trace2_def_param() macro
2024-02-08 16:22:01 -08:00
Junio C Hamano efbae0583b Merge branch 'js/update-urls-in-doc-and-comment' into maint-2.43
Stale URLs have been updated to their current counterparts (or
archive.org) and HTTP links are replaced with working HTTPS links.

* js/update-urls-in-doc-and-comment:
  doc: refer to internet archive
  doc: update links for andre-simon.de
  doc: switch links to https
  doc: update links to current pages
2024-02-08 16:22:01 -08:00
Junio C Hamano 50b8f513a2 Merge branch 'ps/commit-graph-less-paranoid' into maint-2.43
Earlier we stopped relying on commit-graph that (still) records
information about commits that are lost from the object store,
which has negative performance implications.  The default has been
flipped to disable this pessimization.

* ps/commit-graph-less-paranoid:
  commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
2024-02-08 16:22:01 -08:00
Junio C Hamano f8e2ad965a Merge branch 'tz/send-email-negatable-options' into maint-2.43
Newer versions of Getopt::Long started giving warnings against our
(ab)use of it in "git send-email".  Bump the minimum version
requirement for Perl to 5.8.1 (from September 2002) to allow
simplifying our implementation.

* tz/send-email-negatable-options:
  send-email: avoid duplicate specification warnings
  perl: bump the required Perl version to 5.8.1 from 5.8.0
2024-02-08 16:22:01 -08:00
Junio C Hamano c8bcf66bf7 Merge branch 'js/ci-discard-prove-state' into maint-2.43
The way CI testing used "prove" could lead to running the test
suite twice needlessly, which has been corrected.

* js/ci-discard-prove-state:
  ci: avoid running the test suite _twice_
  ci: add support for GitLab CI
  ci: install test dependencies for linux-musl
  ci: squelch warnings when testing with unusable Git repo
  ci: unify setup of some environment variables
  ci: split out logic to set up failed test artifacts
  ci: group installation of Docker dependencies
  ci: make grouping setup more generic
  ci: reorder definitions for grouping functions
2024-02-08 16:22:00 -08:00
Jeff King d70f554cdf commit-graph: retain commit slab when closing NULL commit_graph
This fixes a regression introduced in ac6d45d11f (commit-graph: move
slab-clearing to close_commit_graph(), 2023-10-03), in which running:

  git -c fetch.writeCommitGraph=true fetch --recurse-submodules

multiple times in a freshly cloned repository causes a segfault. What
happens in the second (and subsequent) runs is this:

  1. We make a "struct commit" for any ref tips which we're storing
     (even if we already have them, they still go into FETCH_HEAD).

     Because the first run will have created a commit graph, we'll find
     those commits in the graph.

     The commit struct is therefore created with a NULL "maybe_tree"
     entry, because we can load its oid from the graph later. But to do
     that we need to remember that we got the commit from the graph,
     which is recorded in a global commit_graph_data_slab object.

  2. Because we're using --recurse-submodules, we'll try to fetch each
     of the possible submodules. That implies creating a separate
     "struct repository" in-process for each submodule, which will
     require a later call to repo_clear().

     The call to repo_clear() calls raw_object_store_clear(), which in
     turn calls close_object_store(), which in turn calls
     close_commit_graph(). And the latter frees the commit graph data
     slab.

  3. Later, when trying to write out a new commit graph, we'll ask for
     their tree oid via get_commit_tree_oid(), which will see that the
     object is parsed but with a NULL maybe_tree field. We'd then
     usually pull it from the graph file, but because the slab was
     cleared, we don't realize that we can do so! We end up returning
     NULL and segfaulting.

     (It seems questionable that we'd write a graph entry for such a
     commit anyway, since we know we already have one. I didn't
     double-check, but that may simply be another side effect of having
     cleared the slab).

The bug is in step (2) above. We should not be clearing the slab when
cleaning up the submodule repository structs. Prior to ac6d45d11f, we
did not do so because it was done inside a helper function that returned
early when it saw NULL. So the behavior change from that commit is that
we'll now _always_ clear the slab via repo_clear(), even if the
repository being closed did not have a commit graph (and thus would have
a NULL commit_graph struct).

The most immediate fix is to add in a NULL check in close_commit_graph(),
making it a true noop when passed in an object_store with a NULL
commit_graph (it's OK to just return early, since the rest of its code
is already a noop when passed NULL). That restores the pre-ac6d45d11f
behavior. And that's what this patch does, along with a test that
exercises it (we already have a test that uses submodules along with
fetch.writeCommitGraph, but the bug only triggers when there is a
subsequent fetch and when that fetch uses --recurse-submodules).

So that fixes the regression in the least-risky way possible.

I do think there's some fragility here that we might want to follow up
on. We have a global commit_graph_data_slab that contains graph
positions, and our global commit structs depend on the that slab
remaining valid. But close_commit_graph() is just about closing _one_
object store's graph. So it's dangerous to call that function and clear
the slab without also throwing away any "struct commit" we might have
parsed that depends on it.

Which at first glance seems like a bug we could already trigger. In the
situation described here, there is no commit graph in the submodule
repository, so our commit graph is NULL (in fact, in our test script
there is no submodule repo at all, so we immediately return from
repo_init() and call repo_clear() only to free up memory). But what
would happen if there was one? Wouldn't we see a non-NULL commit_graph
entry, and then clear the global slab anyway?

The answer is "no", but for very bizarre reasons. Remember that
repo_clear() calls raw_object_store_clear(), which then calls
close_object_store() and thus close_commit_graph(). But before it does
so, raw_object_store_clear() does something else: it frees the commit
graph and sets it to NULL! So by this code path we'll _never_ see a
non-NULL commit_graph struct, and thus never clear the slab.

So it happens to work out. But it still seems questionable to me that we
would clear a global slab (which might still be in use) when closing the
commit graph. This clearing comes from 957ba814bf (commit-graph: when
closing the graph, also release the slab, 2021-09-08), and was fixing a
case where we really did need it to be closed (and in that case we
presumably call close_object_store() more directly).

So I suspect there may still be a bug waiting to happen there, as any
object loaded before the call to close_object_store() may be stranded
with a bogus maybe_tree entry (and thus looking at it after the call
might cause an error). But I'm not sure how to trigger it, nor what the
fix should look like (you probably would need to "unparse" any objects
pulled from the graph). And so this patch punts on that for now in favor
of fixing the recent regression in the most direct way, which should not
have any other fallouts.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-05 08:35:26 -08:00
Chandra Pratap 556e68032f write-or-die: make GIT_FLUSH a Boolean environment variable
Among Git's environment variables, the ones marked as "Boolean"
accept values in a way similar to Boolean configuration variables,
i.e. values like 'yes', 'on', 'true' and positive numbers are
taken as "on" and values like 'no', 'off', 'false' are taken as
"off".

GIT_FLUSH can be used to force Git to use non-buffered I/O when
writing to stdout. It can only accept two values, '1' which causes
Git to flush more often and '0' which makes all output buffered.
Make GIT_FLUSH accept more values besides '0' and '1' by turning it
into a Boolean environment variable, modifying the required logic.
Update the related documentation.

Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-04 10:32:21 -08:00
Junio C Hamano 53ded839ae sparse-checkout: use default patterns for 'set' only !stdin
"git sparse-checkout set ---no-cone" uses default patterns when none
is given from the command line, but it should do so ONLY when
--stdin is not being used.  Right now, add_patterns_from_input()
called when reading from the standard input is sloppy and does not
check if there are extra command line parameters that the command
will silently ignore, but that will change soon and not setting this
unnecessary and unused default patterns start to matter when it gets
fixed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26 12:15:58 -08:00
Junio C Hamano d6b6cd1393 archive: "--list" does not take further options
"git archive --list blah" should notice an extra command line
parameter that goes unused.  Make it so.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-21 10:33:09 -08:00
Stan Hu 44dbb3bf29 completion: support pseudoref existence checks for reftables
In contrib/completion/git-completion.bash, there are a bunch of
instances where we read pseudorefs, such as HEAD, MERGE_HEAD,
REVERT_HEAD, and others via the filesystem. However, the upcoming
reftable refs backend won't use '.git/HEAD' at all but instead will
write an invalid refname as placeholder for backwards compatibility,
which will break the git-completion script.

Update the '__git_pseudoref_exists' function to:

1. Recognize the placeholder '.git/HEAD' written by the reftable
   backend (its content is specified in the reftable specs).
2. If reftable is in use, use 'git rev-parse' to determine whether the
    given ref exists.
3. Otherwise, continue to use 'test -f' to check for the ref's filename.

Signed-off-by: Stan Hu <stanhu@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-19 15:11:58 -08:00
Stan Hu 666270a2df completion: refactor existence checks for pseudorefs
In preparation for the reftable backend, this commit introduces a
'__git_pseudoref_exists' function that continues to use 'test -f' to
determine whether a given pseudoref exists in the local filesystem.

Signed-off-by: Stan Hu <stanhu@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-19 15:11:56 -08:00
Jiang Xin 18ce48918c fetch: no redundant error message for atomic fetch
If an error occurs during an atomic fetch, a redundant error message
will appear at the end of do_fetch(). It was introduced in b3a804663c
(fetch: make `--atomic` flag cover backfilling of tags, 2022-02-17).

Because a failure message is displayed before setting retcode in the
function do_fetch(), calling error() on the err message at the end of
this function may result in redundant or empty error message to be
displayed.

We can remove the redundant error() function, because we know that
the function ref_transaction_abort() never fails. While we can find a
common pattern for calling ref_transaction_abort() by running command
"git grep -A1 ref_transaction_abort", e.g.:

    if (ref_transaction_abort(transaction, &error))
        error("abort: %s", error.buf);

Following this pattern, we can tolerate the return value of the function
ref_transaction_abort() being changed in the future. We also delay the
output of the err message to the end of do_fetch() to reduce redundant
code. With these changes, the test case "fetch porcelain output
(atomic)" in t5574 will also be fixed.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-18 08:30:33 -08:00
Jiang Xin 97d82b2963 t5574: test porcelain output of atomic fetch
The test case "fetch porcelain output" checks output of the fetch
command. The error output must be empty with the follow assertion:

    test_must_be_empty stderr

But this assertion fails if using atomic fetch. Refactor this test case
to use different fetch options by splitting it into three test cases.

  1. "setup for fetch porcelain output".

  2. "fetch porcelain output": for non-atomic fetch.

  3. "fetch porcelain output (atomic)": for atomic fetch.

Add new command "test_commit ..." in the first test case, so that if we
run these test cases individually (--run=4-6), "git rev-parse HEAD~"
command will work properly. Run the above test cases, we can find that
one test case has a known breakage, as shown below:

    ok 4 - setup for fetch porcelain output
    ok 5 - fetch porcelain output  # TODO known breakage vanished
    not ok 6 - fetch porcelain output (atomic) # TODO known breakage

The failed test case has an error message with only the error prompt but
no message body, as follows:

    'stderr' is not empty, it contains:
    error:

In a later commit, we will fix this issue.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-18 08:30:32 -08:00
Jeff King dee182941f mailinfo: avoid recursion when unquoting From headers
Our unquote_comment() function is recursive; when it sees a comment
within a comment, like:

  (this is an (embedded) comment)

it recurses to handle the inner comment. This is fine for practical use,
but it does mean that you can easily run out of stack space with a
malicious header. For example:

  perl -e 'print "From: ", "(" x 2**18;' |
  git mailinfo /dev/null /dev/null

segfaults on my system. And since mailinfo is likely to be fed untrusted
input from the Internet (if not by human users, who might recognize a
garbage header, but certainly there are automated systems that apply
patches from a list) it may be possible for an attacker to trigger the
problem.

That said, I don't think there's an interesting security vulnerability
here. All an attacker can do is make it impossible to parse their email
and apply their patch, and there are lots of ways to generate bogus
emails. So it's more of an annoyance than anything.

But it's pretty easy to fix it. The recursion is not helping us preserve
any particular state from each level. The only flag in our parsing is
take_next_literally, and we can never recurse when it is set (since the
start of a new comment implies it was not backslash-escaped). So it is
really only useful for finding the end of the matched pair of
parentheses. We can do that easily with a simple depth counter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:33:52 -08:00
Jeff King 2d9396c2fe t5100: make rfc822 comment test more careful
When processing "From" headers in an email, mailinfo "unquotes" quoted
strings and rfc822 parenthesized comments. For quoted strings, we
actually remove the double-quotes, so:

  From: "A U Thor" <someone@example.com>

become:

  Author: A U Thor
  Email: someone@example.com

But for comments, we leave the outer parentheses in place, so:

  From: A U (this is a comment) Thor <someone@example.com>

becomes:

  Author: A U (this is a comment) Thor
  Email: someone@example.com

So what is the comment "unquoting" actually doing? In our code, being in
a comment section has exactly two effects:

  1. We'll unquote backslash-escaped characters inside a comment
     section.

  2. We _won't_ unquote double-quoted strings inside a comment section.

Our test for comments in t5100 checks this:

  From: "A U Thor" <somebody@example.com> (this is \(really\) a comment (honestly))

So it is covering (1), but not (2). Let's add in a quoted string to
cover this.

Moreover, because the comment appears at the end of the From header,
there's nothing to confirm that we correctly found the end of the
comment section (and not just the end-of-string). Let's instead move it
to the beginning of the header, which means we can confirm that the
existing quoted string is detected (which will only happen if we know
we've left the comment block).

As expected, the test continues to pass, but this will give us more
confidence as we refactor the code in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:33:50 -08:00
Jeff King d1bd3a8c34 mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
When processing a header like a "From" line, mailinfo uses
unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized
comments. It takes a NUL-terminated string on input, and loops over the
"in" pointer until it sees the NUL. When it finds the start of an
interesting block, it delegates to helper functions which also increment
"in", and return the updated pointer.

But there's a bug here: the helpers find the NUL with a post-increment
in the loop condition, like:

   while ((c = *in++) != 0)

So when they do see a NUL (rather than the correct termination of the
quote or comment section), they return "in" as one _past_ the NUL
terminator. And thus the outer loop in unquote_quoted_pair() does not
realize we hit the NUL, and keeps reading past the end of the buffer.

We should instead make sure to return "in" positioned at the NUL, so
that the caller knows to stop their loop, too. A hacky way to do this is
to return "in - 1" after leaving the inner loop. But a slightly cleaner
solution is to avoid incrementing "in" until we are sure it contained a
non-NUL byte (i.e., doing it inside the loop body).

The two tests here show off the problem. Since we check the output,
they'll _usually_ report a failure in a normal build, but it depends on
what garbage bytes are found after the heap buffer. Building with
SANITIZE=address reliably notices the problem. The outcome (both the
exit code and the exact bytes) are just what Git happens to produce for
these cases today, and shouldn't be taken as an endorsement. It might be
reasonable to abort on an unterminated string, for example. The priority
for this patch is fixing the out-of-bounds memory access.

Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-12 15:32:49 -08:00
Patrick Steinhardt c0cadb0576 reftable/block: reuse buffer to compute record keys
When iterating over entries in the block iterator we compute the key of
each of the entries and write it into a buffer. We do not reuse the
buffer though and thus re-allocate it on every iteration, which is
wasteful.

Refactor the code to reuse the buffer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:17 -08:00
Patrick Steinhardt a8305bc6d8 reftable/block: introduce macro to initialize struct block_iter
There are a bunch of locations where we initialize members of `struct
block_iter`, which makes it harder than necessary to expand this struct
to have additional members. Unify the logic via a new `BLOCK_ITER_INIT`
macro that initializes all members.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:17 -08:00
Patrick Steinhardt 829231dc20 reftable/merged: reuse buffer to compute record keys
When iterating over entries in the merged iterator's queue, we compute
the key of each of the entries and write it into a buffer. We do not
reuse the buffer though and thus re-allocate it on every iteration,
which is wasteful given that we never transfer ownership of the
allocated bytes outside of the loop.

Refactor the code to reuse the buffer. This also fixes a potential
memory leak when `merged_iter_advance_subiter()` returns an error.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:16 -08:00
Patrick Steinhardt 9abda98149 reftable/stack: fix use of unseeded randomness
When writing a new reftable stack, Git will first create the stack with
a random suffix so that concurrent updates will not try to write to the
same file. This random suffix is computed via a call to rand(3P). But we
never seed the function via srand(3P), which means that the suffix is in
fact always the same.

Fix this bug by using `git_rand()` instead, which does not need to be
initialized. While this function is likely going to be slower depending
on the platform, this slowness should not matter in practice as we only
use it when writing a new reftable stack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:16 -08:00
Patrick Steinhardt 3054fbd93e reftable/stack: fix stale lock when dying
When starting a transaction via `reftable_stack_init_addition()`, we
create a lockfile for the reftable stack itself which we'll write the
new list of tables to. But if we terminate abnormally e.g. via a call to
`die()`, then we do not remove the lockfile. Subsequent executions of
Git which try to modify references will thus fail with an out-of-date
error.

Fix this bug by registering the lock as a `struct tempfile`, which
ensures automatic cleanup for us.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:16 -08:00
Patrick Steinhardt d779996a10 reftable/stack: reuse buffers when reloading stack
In `reftable_stack_reload_once()` we iterate over all the tables added
to the stack in order to figure out whether any of the tables needs to
be reloaded. We use a set of buffers in this context to compute the
paths of these tables, but discard those buffers on every iteration.
This is quite wasteful given that we do not need to transfer ownership
of the allocated buffer outside of the loop.

Refactor the code to instead reuse the buffers to reduce the number of
allocations we need to do. Note that we do not have to manually reset
the buffer because `stack_filename()` does this for us already.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:16 -08:00
Patrick Steinhardt 5c086453ff reftable/stack: perform auto-compaction with transactional interface
Whenever updating references or reflog entries in the reftable stack, we
need to add a new table to the stack, thus growing the stack's length by
one. The stack can grow to become quite long rather quickly, leading to
performance issues when trying to read records. But besides performance
issues, this can also lead to exhaustion of file descriptors very
rapidly as every single table requires a separate descriptor when
opening the stack.

While git-pack-refs(1) fixes this issue for us by merging the tables, it
runs too irregularly to keep the length of the stack within reasonable
limits. This is why the reftable stack has an auto-compaction mechanism:
`reftable_stack_add()` will call `reftable_stack_auto_compact()` after
its added the new table, which will auto-compact the stack as required.

But while this logic works alright for `reftable_stack_add()`, we do not
do the same in `reftable_addition_commit()`, which is the transactional
equivalent to the former function that allows us to write multiple
updates to the stack atomically. Consequentially, we will easily run
into file descriptor exhaustion in code paths that use many separate
transactions like e.g. non-atomic fetches.

Fix this issue by calling `reftable_stack_auto_compact()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:16 -08:00
Patrick Steinhardt 15f98b602f reftable/stack: verify that reftable_stack_add() uses auto-compaction
While we have several tests that check whether we correctly perform
auto-compaction when manually calling `reftable_stack_auto_compact()`,
we don't have any tests that verify whether `reftable_stack_add()` does
call it automatically. Add one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:16 -08:00
Patrick Steinhardt 85a8c899ce reftable: handle interrupted writes
There are calls to write(3P) where we don't properly handle interrupts.
Convert them to use `write_in_full()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:16 -08:00
Patrick Steinhardt 917a2b3ce9 reftable: handle interrupted reads
There are calls to pread(3P) and read(3P) where we don't properly handle
interrupts. Convert them to use `pread_in_full()` and `read_in_full()`,
respectively.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:16 -08:00
Patrick Steinhardt e32b8ece64 reftable: wrap EXPECT macros in do/while
The `EXPECT` macros used by the reftable test framework are all using a
single `if` statement with the actual condition. This results in weird
syntax when using them in if/else statements like the following:

```
if (foo)
	EXPECT(foo == 2)
else
	EXPECT(bar == 2)
```

Note that there need not be a trailing semicolon. Furthermore, it is not
immediately obvious whether the else now belongs to the `if (foo)` or
whether it belongs to the expanded `if (foo == 2)` from the macro.

Fix this by wrapping the macros in a do/while loop.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-11 07:23:15 -08:00