Commit graph

20788 commits

Author SHA1 Message Date
John Cai a4cf900ee7 diff: teach diff to read algorithm from diff driver
It can be useful to specify diff algorithms per file type. For example,
one may want to use the minimal diff algorithm for .json files, another
for .c files, etc.

The diff machinery already checks attributes for a diff driver. Teach
the diff driver parser a new type "algorithm" to look for in the
config, which will be used if a driver has been specified through the
attributes.

Enforce precedence of the diff algorithm by favoring the command line
option, then looking at the driver attributes & config combination, then
finally the diff.algorithm config.

To enforce precedence order, use a new `ignore_driver_algorithm` member
during options parsing to indicate the diff algorithm was set via command
line args.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-21 09:29:10 -08:00
Phillip Wood 16b3880dd7 rebase -i: check labels and refs when parsing todo list
Check that the argument to the "label" and "update-ref" commands is a
valid refname when the todo list is parsed rather than waiting until the
command is executed. This means that the user can deal with any errors
at the beginning of the rebase rather than having it stop halfway
through due to a typo in a label name. The "update-ref" command is
changed to reject single level refs as it is all to easy to type
"update-ref branch" which is incorrect rather than "update-ref
refs/heads/branch"

Note that it is not straight forward to check the arguments to "reset"
and "merge" commands as they may be any revision, not just a refname and
we do not have an equivalent of check_refname_format() for revisions.

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-21 09:18:37 -08:00
Patrick Steinhardt 6eb095d787 delta-islands: fix segfault when freeing island marks
In 647982bb71 (delta-islands: free island_marks and bitmaps, 2023-02-03)
we have introduced logic to free `island_marks` in order to reduce heap
memory usage in git-pack-objects(1). This commit is causing segfaults in
the case where this Git command does not load delta islands at all, e.g.
when reading object IDs from standard input. One such crash can be hit
when using repacking multi-pack-indices with delta islands enabled.

The root cause of this bug is that we unconditionally dereference the
`island_marks` variable even in the case where it is a `NULL` pointer,
which is fixed by making it conditional. Note that we still leave the
logic in place to set the pointer to `-1` to detect use-after-free bugs
even when there are no loaded island marks at all.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-21 09:15:04 -08:00
René Scharfe fd2da4b1ea archive: add --mtime
Allow users to specify the modification time of archive entries.  The
new option --mtime uses approxidate() to parse a time specification and
overrides the default of using the current time for trees and the commit
time for tags and commits.  It can be used to create a reproducible
archive for a tree, or to use a specific mtime without creating a commit
with GIT_COMMITTER_DATE set.

This implementation doesn't support the negated form of the new option,
i.e. --no-mtime is not accepted.  It is not possible to have no mtime at
all.  We could use the Unix epoch or revert to the default behavior, but
since negation is not necessary for the intended use it's left undecided
for now.

Requested-by: Raul E Rangel <rrangel@chromium.org>
Suggested-by: demerphq <demerphq@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-18 09:29:13 -08:00
Junio C Hamano 50bebf98d9 format.attach: allow empty value to disable multi-part messages
When a lower precedence configuration file (e.g. /etc/gitconfig)
defines format.attach in any way, there was no way to disable it in
a more specific configuration file (e.g. $HOME/.gitconfig).

Change the behaviour of setting it to an empty string.  It used to
mean that the result is still a multipart message with only dashes
used as a multi-part separator, but now it resets the setting to
the default (which would be to give an inline patch, unless other
command line options are in effect).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-17 15:43:09 -08:00
Jeff King 3b0ebb7a8d t0066: drop setup of "dir5"
The symlink setup in t0066 makes several directories with links, dir4
through dir6. But ever since dir5 was introduced in fa1da7d2ee
(dir-iterator: add flags parameter to dir_iterator_begin, 2019-07-10),
it has never actually been used. It was left over from an earlier
iteration of the patch which tried to handle recursive symlinks
specially, as seen in:

  https://lore.kernel.org/git/20190502144829.4394-7-matheus.bernardino@usp.br/

It's not hurting any of the existing tests to be there, but the extra
setup is confusing to anybody trying to read and understand the tests.
Let's drop the extra directory, and we'll rename "dir6" to "dir5" so
nobody wonders whether the gap in naming is important.

Helped-by: Matheus Tavares Bernardino <matheus.tavb@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-16 17:55:42 -08:00
Jeff King 29ae2c9e74 add basic http proxy tests
We do not test our http proxy functionality at all in the test suite, so
this is a pretty big blind spot. Let's at least add a basic check that
we can go through an authenticating proxy to perform a clone.

A few notes on the implementation:

  - I'm using a single apache instance to proxy to itself. This seems to
    work fine in practice, and we can check with a test that this rather
    unusual setup is doing what we expect.

  - I've put the proxy tests into their own script, and it's the only
    one which loads the apache proxy config. If any platform can't
    handle this (e.g., doesn't have the right modules), the start_httpd
    step should fail and gracefully skip the rest of the script (but all
    the other http tests in existing scripts will continue to run).

  - I used a separate passwd file to make sure we don't ever get
    confused between proxy and regular auth credentials. It's using the
    antiquated crypt() format. This is a terrible choice security-wise
    in the modern age, but it's what our existing passwd file uses, and
    should be portable. It would probably be reasonable to switch both
    of these to bcrypt, but we can do that in a separate patch.

  - On the client side, we test two situations with credentials: when
    they are present in the url, and when the username is present but we
    prompt for the password. I think we should be able to handle the
    case that _neither_ is present, but an HTTP 407 causes us to prompt
    for them. However, this doesn't seem to work. That's either a bug,
    or at the very least an opportunity for a feature, but I punted on
    it for now. The point of this patch is just getting basic coverage,
    and we can explore possible deficiencies later.

  - this doesn't work with LIB_HTTPD_SSL. This probably would be
    valuable to have, as https over an http proxy is totally different
    (it uses CONNECT to tunnel the session). But adding in
    mod_proxy_connect and some basic config didn't seem to work for me,
    so I punted for now. Much of the rest of the test suite does not
    currently work with LIB_HTTPD_SSL either, so we shouldn't be making
    anything much worse here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-16 16:24:23 -08:00
Taylor Blau e00e56a7df dir-iterator: drop unused DIR_ITERATOR_FOLLOW_SYMLINKS
The `FOLLOW_SYMLINKS` flag was added to the dir-iterator API in
fa1da7d2ee (dir-iterator: add flags parameter to dir_iterator_begin,
2019-07-10) in order to follow symbolic links while traversing through a
directory.

`FOLLOW_SYMLINKS` gained its first caller in ff7ccc8c9a (clone: use
dir-iterator to avoid explicit dir traversal, 2019-07-10), but it was
subsequently removed in 6f054f9fb3 (builtin/clone.c: disallow `--local`
clones with symlinks, 2022-07-28).

Since then, we've held on to the code for `DIR_ITERATOR_FOLLOW_SYMLINKS`
in the name of making minimally invasive changes during a security
embargo.

In fact, we even changed the dir-iterator API in bffc762f87
(dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS,
2023-01-24) without having any non-test callers of that flag.

Now that we're past those security embargo(s), let's finalize our
cleanup of the `DIR_ITERATOR_FOLLOW_SYMLINKS` code and remove its
implementation since there are no remaining callers.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-16 16:21:56 -08:00
Junio C Hamano 58eab6ff13 test-genzeros: avoid raw write(2)
This test helper feeds 256kB of data at once to a single invocation
of the write(2) system call, which may be too much for some
platforms.

Call our xwrite() wrapper that knows to honor MAX_IO_SIZE limit and
cope with short writes due to EINTR instead, and die a bit more
loudly by calling die_errno() when xwrite() indicates an error.

Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-16 08:30:38 -08:00
Junio C Hamano 59397e9b7e Merge branch 'cw/doc-pushurl-vs-url'
Doc update.

* cw/doc-pushurl-vs-url:
  Documentation: clarify multiple pushurls vs urls
2023-02-15 17:11:54 -08:00
Junio C Hamano 06bca9708a Merge branch 'ab/retire-scripted-add-p'
Finally retire the scripted "git add -p/-i" implementation and have
everybody use the one reimplemented in C.

* ab/retire-scripted-add-p:
  docs & comments: replace mentions of "git-add--interactive.perl"
  add API: remove run_add_interactive() wrapper function
  add: remove "add.interactive.useBuiltin" & Perl "git add--interactive"
2023-02-15 17:11:53 -08:00
Junio C Hamano 063ec7b3b8 Merge branch 'kf/t5000-modernise'
Test clean-up.

* kf/t5000-modernise:
  t5000: modernise archive and :(glob) test
2023-02-15 17:11:53 -08:00
Junio C Hamano 4a6e6b0d5b Merge branch 'ar/userdiff-java-update'
Userdiff regexp update for Java language.

* ar/userdiff-java-update:
  userdiff: support Java sealed classes
  userdiff: support Java record types
  userdiff: support Java type parameters
2023-02-15 17:11:52 -08:00
Junio C Hamano a232de58f2 Merge branch 'ab/sequencer-unleak'
Plug leaks in sequencer subsystem and its users.

* ab/sequencer-unleak:
  commit.c: free() revs.commit in get_fork_point()
  builtin/rebase.c: free() "options.strategy_opts"
  sequencer.c: always free() the "msgbuf" in do_pick_commit()
  builtin/rebase.c: fix "options.onto_name" leak
  builtin/revert.c: move free-ing of "revs" to replay_opts_release()
  sequencer API users: fix get_replay_opts() leaks
  sequencer.c: split up sequencer_remove_state()
  rebase: use "cleanup" pattern in do_interactive_rebase()
2023-02-15 17:11:52 -08:00
Junio C Hamano 4f59836451 Merge branch 'ds/bundle-uri-5'
The bundle-URI subsystem adds support for creation-token heuristics
to help incremental fetches.

* ds/bundle-uri-5:
  bundle-uri: test missing bundles with heuristic
  bundle-uri: store fetch.bundleCreationToken
  fetch: fetch from an external bundle URI
  bundle-uri: drop bundle.flag from design doc
  clone: set fetch.bundleURI if appropriate
  bundle-uri: download in creationToken order
  bundle-uri: parse bundle.<id>.creationToken values
  bundle-uri: parse bundle.heuristic=creationToken
  t5558: add tests for creationToken heuristic
  bundle: verify using check_connected()
  bundle: test unbundling with incomplete history
2023-02-15 17:11:52 -08:00
Johannes Schindelin ad6b320756 gpg: do show gpg's error message upon failure
There are few things more frustrating when signing a commit fails than
reading a terse "error: gpg failed to sign the data" message followed by
the unsurprising "fatal: failed to write commit object" message.

In many cases where signing a commit or tag fails, `gpg` actually said
something helpful, on its stderr, and Git even consumed that, but then
keeps mum about it.

Teach Git to stop withholding that rather important information.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 08:55:24 -08:00
Johannes Schindelin 8300d15d5e t7510: add a test case that does not need gpg
This test case not only increases test coverage in setups without
working gpg, but also prepares for verifying that the error message of
`gpg.program` is shown upon failure.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 08:55:22 -08:00
Jeff King 613bef56b8 shorten_unambiguous_ref(): avoid sscanf()
To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just
"foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop
the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match
that against the refname, pulling the "%s" content into a separate
buffer.

This has a few downsides:

  - sscanf("%s") reportedly misbehaves on macOS with some input and
    locale combinations, returning a partial or garbled string. See
    this thread:

      https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/

  - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD"
    rule would never pull "origin" out of "refs/remotes/origin/HEAD".
    Instead it always produced "origin/HEAD", which is redundant with
    the "refs/remotes/%s" rule.

  - scanf in general is an error-prone interface. For example, scanning
    for "%s" will copy bytes into a destination string, which must have
    been correctly sized ahead of time to avoid a buffer overflow. In
    this case, the code is OK (the buffer is pessimistically sized to
    match the original string, which should give us a maximum). But in
    general, we do not want to encourage people to use scanf at all.

So instead, let's note that our lookup rules are not arbitrary format
strings, but all contain exactly one "%.*s" placeholder. We already rely
on this, both for lookup (we feed the lookup format along with exactly
one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s"
to "%s", and then insist that sscanf() finds exactly one result).

We can parse this manually by just matching the bytes that occur before
and after the "%.*s" placeholder. While we have a few extra lines of
parsing code, the result is arguably simpler, as can skip the
preprocessing step and its tricky memory management entirely.

The in-code comments should explain the parsing strategy, but there's
one subtle change here. The original code allocated a single buffer, and
then overwrote it in each loop iteration, since that's the only option
sscanf() gives us. But our parser can actually return a ptr/len combo
for the matched string, which is all we need (since we just feed it back
to the lookup rules with "%.*s"), and then copy it only when returning
to the caller.

There are a few new tests here, all using symbolic-ref (the code can be
triggered in many ways, but symrefs are convenient in that we don't need
to create a real ref, which avoids any complications from the filesystem
munging the name):

  - the first covers the real-world case which misbehaved on macOS.
    Setting LC_ALL is required to trigger the problem there (since
    otherwise our tests use LC_ALL=C), and hopefully is at worst simply
    ignored on other systems (and doesn't cause libc to complain, etc,
    on systems without that locale).

  - the second covers the "origin/HEAD" case as discussed above, which
    is now fixed

  - the remainder are for "weird" cases that work both before and after
    this patch, but would be easy to get wrong with off-by-one problems
    in the parsing (and came out of discussions and earlier iterations
    of the patch that did get them wrong).

  - absent here are tests of boring, expected-to-work cases like
    "refs/heads/foo", etc. Those are covered all over the test suite
    both explicitly (for-each-ref's refname:short) and implicitly (in
    the output of git-status, etc).

Reported-by: 孟子易 <mengziyi540841@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 08:53:17 -08:00
Junio C Hamano 2f80d1b42e Merge branch 'rj/branch-copy-and-rename' into maint-2.39
Fix a pair of bugs in 'git branch'.

* rj/branch-copy-and-rename:
  branch: force-copy a branch to itself via @{-1} is a no-op
2023-02-14 14:15:55 -08:00
Junio C Hamano 8ca2b1f248 Merge branch 'rs/t3920-crlf-eating-grep-fix' into maint-2.39
Test fix.

* rs/t3920-crlf-eating-grep-fix:
  t3920: support CR-eating grep
2023-02-14 14:15:54 -08:00
Junio C Hamano 763ae829a3 Merge branch 'js/t3920-shell-and-or-fix' into maint-2.39
Test fix.

* js/t3920-shell-and-or-fix:
  t3920: don't ignore errors of more than one command with `|| true`
2023-02-14 14:15:54 -08:00
Junio C Hamano 81b216e4f7 Merge branch 'ab/t4023-avoid-losing-exit-status-of-diff' into maint-2.39
Test fix.

* ab/t4023-avoid-losing-exit-status-of-diff:
  t4023: fix ignored exit codes of git
2023-02-14 14:15:54 -08:00
Junio C Hamano 54941a5316 Merge branch 'ab/t7600-avoid-losing-exit-status-of-git' into maint-2.39
Test fix.

* ab/t7600-avoid-losing-exit-status-of-git:
  t7600: don't ignore "rev-parse" exit code in helper
2023-02-14 14:15:54 -08:00
Junio C Hamano 2509d0198c Merge branch 'ab/t5314-avoid-losing-exit-status' into maint-2.39
Test fix.

* ab/t5314-avoid-losing-exit-status:
  t5314: check exit code of "git"
2023-02-14 14:15:53 -08:00
Junio C Hamano db2a91ba36 Merge branch 'rs/t4205-do-not-exit-in-test-script' into maint-2.39
Test fix.

* rs/t4205-do-not-exit-in-test-script:
  t4205: don't exit test script on failure
2023-02-14 14:15:53 -08:00
Junio C Hamano 1f071460d3 Merge branch 'rs/ls-tree-path-expansion-fix' into maint-2.39
"git ls-tree --format='%(path) %(path)' $tree $path" showed the
path three times, which has been corrected.

* rs/ls-tree-path-expansion-fix:
  ls-tree: remove dead store and strbuf for quote_c_style()
  ls-tree: fix expansion of repeated %(path)
2023-02-14 14:15:52 -08:00
Junio C Hamano 4950677b48 Merge branch 'ws/single-file-cone' into maint-2.39
The logic to see if we are using the "cone" mode by checking the
sparsity patterns has been tightened to avoid mistaking a pattern
that names a single file as specifying a cone.

* ws/single-file-cone:
  dir: check for single file cone patterns
2023-02-14 14:15:51 -08:00
Junio C Hamano f8382a6396 Merge branch 'jk/ext-diff-with-relative' into maint-2.39
"git diff --relative" did not mix well with "git diff --ext-diff",
which has been corrected.

* jk/ext-diff-with-relative:
  diff: drop "name" parameter from prepare_temp_file()
  diff: clean up external-diff argv setup
  diff: use filespec path to set up tempfiles for ext-diff
2023-02-14 14:15:51 -08:00
Junio C Hamano 7cbfd0e572 Merge branch 'ab/bundle-wo-args' into maint-2.39
Fix to a small regression in 2.38 days.

* ab/bundle-wo-args:
  bundle <cmd>: have usage_msg_opt() note the missing "<file>"
  builtin/bundle.c: remove superfluous "newargc" variable
  bundle: don't segfault on "git bundle <subcmd>"
2023-02-14 14:15:50 -08:00
Junio C Hamano 725f293355 Merge branch 'lk/line-range-parsing-fix' into maint-2.39
When given a pattern that matches an empty string at the end of a
line, the code to parse the "git diff" line-ranges fell into an
infinite loop, which has been corrected.

* lk/line-range-parsing-fix:
  line-range: fix infinite loop bug with '$' regex
2023-02-14 14:15:49 -08:00
Junio C Hamano c867e4fa18 Sync with Git 2.39.2 2023-02-13 17:03:55 -08:00
René Scharfe 567342fc77 test-ctype: test iscntrl, ispunct, isxdigit and isprint
Test the character classifiers added by 1c149ab2dd (ctype: support
iscntrl, ispunct, isxdigit and isprint, 2012-10-15) and 0fcec2ce54
(format-patch: make rfc2047 encoding more strict, 2012-10-18).

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-13 13:36:05 -08:00
René Scharfe 2c17de8b37 test-ctype: test islower and isupper
Test the character classifiers added by 43ccdf56ec (ctype: implement
islower/isupper macro, 2012-02-10).

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-13 13:36:05 -08:00
René Scharfe d5071be5ed test-ctype: test isascii
Test the character classifier added by c2e9364a06 (cleanup: add
isascii(), 2009-03-07).  It returns 1 for NUL as well, which requires
special treatment, as our string-based tester can't find it with
strcmp(3).  Allow NUL to be given as the first character in a class
specification string.  This has the downside of no longer supporting
the empty string, but that's OK since we are not interested in testing
character classes with no members.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-13 13:36:05 -08:00
Junio C Hamano fd2d4c135e gpg-interface: lazily initialize and read the configuration
Instead of forcing the porcelain commands to always read the
configuration variables related to the signing and verifying
signatures, lazily initialize the necessary subsystem on demand upon
the first use.

This hopefully would make it more future-proof as we do not have to
think and decide whether we should call git_gpg_config() in the
git_config() callback for each command.

A few git_config() callback functions that used to be custom
callbacks are now just a thin wrapper around git_default_config().
We could further remove, git_FOO_config and replace calls to
git_config(git_FOO_config) with git_config(git_default_config), but
to make it clear which ones are affected and the effect is only the
removal of git_gpg_config(), it is vastly preferred not to do such a
change in this step (they can be done on top once the dust settled).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-09 17:01:27 -08:00
Junio C Hamano a674c7edcf Merge branch 'jk/httpd-test-updates'
Test update.

* jk/httpd-test-updates:
  t/lib-httpd: increase ssl key size to 2048 bits
  t/lib-httpd: drop SSLMutex config
  t/lib-httpd: bump required apache version to 2.4
  t/lib-httpd: bump required apache version to 2.2
2023-02-09 14:40:47 -08:00
Elijah Newren b2182a8730 name-rev: fix names by dropping taggerdate workaround
Commit 7550424804 ("name-rev: include taggerdate in considering the best
name", 2016-04-22) introduced the idea of using taggerdate in the
criteria for selecting the best name.  At the time, a certain commit in
linux.git -- namely, aed06b9cfcab -- was being named by name-rev as
    v4.6-rc1~9^2~792
which, while correct, was very suboptimal.  Some investigation found
that tweaking the MERGE_TRAVERSAL_WEIGHT to lower it could give
alternate answers such as
    v3.13-rc7~9^2~14^2~42
or
    v3.13~5^2~4^2~2^2~1^2~42
A manual solution involving looking at tagger dates came up with
    v3.13-rc1~65^2^2~42
which is much nicer.  That workaround was then implemented in name-rev.

Unfortunately, the taggerdate heuristic is causing bugs.  I was pointed
to a case in a private repository where name-rev reports a name of the
form
    v2022.10.02~86
when users expected to see one of the form
    v2022.10.01~2
(I've modified the names and numbers a bit from the real testcase.)  As
you can probably guess, v2022.10.01 was created after v2022.10.02 (by a
few hours), even though it pointed to an older commit.  While the
condition is unusual even in the repository in question, it is not the
only problematic set of tags in that repository.  The taggerdate logic
is causing problems.

Further, it turns out that this taggerdate heuristic isn't even helping
anymore.  Due to the fix to naming logic in 3656f84278 ("name-rev:
prefer shorter names over following merges", 2021-12-04), we get
improved names without the taggerdate heuristic.  For the original
commit of interest in linux.git, a modern git without the taggerdate
heuristic still provides the same optimal answer of interest, namely:
    v3.13-rc1~65^2^2~42

So, the taggerdate is no longer providing benefit, and it is causing
problems.  Simply get rid of it.

However, note that "taggerdate" as a variable is used to store things
besides a taggerdate these days.  Ever since commit ef1e74065c
("name-rev: favor describing with tags and use committer date to
tiebreak", 2017-03-29), this has been used to store committer dates and
there it is used as a fallback tiebreaker (as opposed to a primary
criteria overriding effective distance calculations).  We do not want to
remove that fallback tiebreaker, so not all instances of "taggerdate"
are removed in this change.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-09 09:01:36 -08:00
Andrei Rybak 93d52ed050 userdiff: support Java sealed classes
A new kind of class was added in Java 17 -- sealed classes.[1]  This
feature includes several new keywords that may appear in a declaration
of a class.  New modifiers before name of the class: "sealed" and
"non-sealed", and a clause after name of the class marked by keyword
"permits".

The current set of regular expressions in userdiff.c already allows the
modifier "sealed" and the "permits" clause, but not the modifier
"non-sealed", which is the first hyphenated keyword in Java.[2]  Allow
hyphen in the words that precede the name of type to match the
"non-sealed" modifier.

In new input file "java-sealed" for the test t4018-diff-funcname.sh, use
a Java code comment for the marker "RIGHT".  This workaround is needed,
because the name of the sealed class appears on the line of code that
has the "ChangeMe" marker.

[1] Detailed description in "JEP 409: Sealed Classes"
    https://openjdk.org/jeps/409
[2] "JEP draft: Keyword Management for the Java Language"
    https://openjdk.org/jeps/8223002

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-08 12:57:13 -08:00
Andrei Rybak 575e6fcfcc userdiff: support Java record types
A new kind of class was added in Java 16 -- records.[1]  The syntax of
records is similar to regular classes with one important distinction:
the name of the record class is followed by a mandatory list of
components.  The list is enclosed in parentheses, it may be empty, and
it may immediately follow the name of the class or type parameters, if
any, with or without separating whitespace.  For example:

    public record Example(int i, String s) {
    }

    public record WithTypeParameters<A, B>(A a, B b, String s) {
    }

    record SpaceBeforeComponents (String comp1, int comp2) {
    }

Support records in the builtin userdiff pattern for Java.  Add "record"
to the alternatives of keywords for kinds of class.

Allowing matching various possibilities for the type parameters and/or
list of the components of a record has already been covered by the
preceding patch.

[1] detailed description is available in "JEP 395: Records"
    https://openjdk.org/jeps/395

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-08 12:57:11 -08:00
Andrei Rybak 39226a8dac userdiff: support Java type parameters
A class or interface in Java can have type parameters following the name
in the declared type, surrounded by angle brackets (paired less than and
greater than signs).[2]   The type parameters -- `A` and `B` in the
examples -- may follow the class name immediately:

    public class ParameterizedClass<A, B> {
    }

or may be separated by whitespace:

    public class SpaceBeforeTypeParameters <A, B> {
    }

A part of the builtin userdiff pattern for Java matches declarations of
classes, enums, and interfaces.  The regular expression requires at
least one whitespace character after the name of the declared type.
This disallows matching for opening angle bracket of type parameters
immediately after the name of the type.  Mandatory whitespace after the
name of the type also disallows using the pattern in repositories with a
fairly common code style that puts braces for the body of a class on
separate lines:

    class WithLineBreakBeforeOpeningBrace
    {
    }

Support matching Java code in more diverse code styles and declarations
of classes and interfaces with type parameters immediately following the
name of the type in the builtin userdiff pattern for Java.  Do so by
just matching anything until the end of the line after the keywords for
the kind of type being declared.

[1] Since Java 5 released in 2004.
[2] Detailed description is available in the Java Language
    Specification, sections "Type Variables" and "Parameterized Types":
    https://docs.oracle.com/javase/specs/jls/se17/html/jls-4.html#jls-4.4

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-08 12:56:57 -08:00
Emily Shaffer 0414b3891c hook: support a --to-stdin=<path> option
Expose the "path_to_stdin" API added in the preceding commit in the
"git hook run" command.

For now we won't be using this command interface outside of the tests,
but exposing this functionality makes it easier to test the hook
API. The plan is to use this to extend the "sendemail-validate"
hook[1][2].

1. https://lore.kernel.org/git/ad152e25-4061-9955-d3e6-a2c8b1bd24e7@amd.com
2. https://lore.kernel.org/git/20230120012459.920932-1-michael.strawbridge@amd.com

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-08 12:50:03 -08:00
Junio C Hamano 3fe6612d4c Merge branch 'ds/scalar-ignore-cron-error'
Allow "scalar" to warn but continue when its periodic maintenance
feature cannot be enabled.

* ds/scalar-ignore-cron-error:
  scalar: only warn when background maintenance fails
  t921*: test scalar behavior starting maintenance
  t: allow 'scalar' in test_must_fail
2023-02-08 09:14:42 -08:00
Calvin Wan d390e08076 Documentation: clarify multiple pushurls vs urls
In a remote with multiple configured URLs, `git remote -v` shows the
correct url that fetch uses. However, `git config remote.<remote>.url`
returns the last defined url instead. This discrepancy can cause
confusion for users with a remote defined as such, since any url
defined after the first essentially acts as a pushurl.

Add documentation to clarify how fetch interacts with multiple urls
and how push interacts with multiple pushurls and urls.

Add test affirming interaction between fetch and multiple urls.

Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-07 11:02:27 -08:00
Ævar Arnfjörð Bjarmason 0c10ed19c4 commit.c: free() revs.commit in get_fork_point()
Fix a memory leak that's been with us since d96855ff51 (merge-base:
teach "--fork-point" mode, 2013-10-23).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 16:03:53 -08:00
Ævar Arnfjörð Bjarmason 94ad545d47 builtin/rebase.c: fix "options.onto_name" leak
Similar to the existing "squash_onto_name" added in [1] we need to
free() the xstrdup()'d "options.onto.name" added for "--keep-base" in
[2]..

1. 9dba809a69 (builtin rebase: support --root, 2018-09-04)
2. 414d924beb (rebase: teach rebase --keep-base, 2019-08-27)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 16:03:52 -08:00
Ævar Arnfjörð Bjarmason 9ff2f06069 sequencer API users: fix get_replay_opts() leaks
Make the replay_opts_release() function added in the preceding commit
non-static, and use it for freeing the "struct replay_opts"
constructed for "rebase" and "revert".

To safely call our new replay_opts_release() we'll need to stop
calling it in sequencer_remove_state(), and instead call it where we
allocate the "struct replay_opts" itself.

This is because in e.g. do_interactive_rebase() we construct a "struct
replay_opts" with "get_replay_opts()", and then call
"complete_action()". If we get far enough in that function without
encountering errors we'll call "pick_commits()" which (indirectly)
calls sequencer_remove_state() at the end.

But if we encounter errors anywhere along the way we'd punt out early,
and not free() the memory we allocated. Remembering whether we
previously called sequencer_remove_state() would be a hassle.

Using a FREE_AND_NULL() pattern would also work, as it would be safe
to call replay_opts_release() repeatedly. But let's fix this properly
instead, by having the owner of the data free() it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 16:03:52 -08:00
Ævar Arnfjörð Bjarmason c65d18cb52 push: free_refs() the "local_refs" in set_refspecs()
Fix a memory leak that's been with us since this code was added in
ca02465b41 (push: use remote.$name.push as a refmap, 2013-12-03).

The "remote = remote_get(...)" added in the same commit would seem to
leak based only on the context here, but that function is a wrapper
for sticking the remotes we fetch into "the_repository->remote_state".

See fd3cb0501e (remote: move static variables into per-repository
struct, 2021-11-17) for the addition of code in repository.c that
free's the "remote" allocated here.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:34:40 -08:00
Ævar Arnfjörð Bjarmason 1fdd31cf52 receive-pack: release the linked "struct command *" list
Fix a memory leak that's been with us since this code was introduced
in [1]. Later in [2] we started using FLEX_ALLOC_MEM() to allocate the
"struct command *".

1. 575f497456 (Add first cut at "git-receive-pack", 2005-06-29)
2. eb1af2df0b (git-receive-pack: start parsing ref update commands,
   2005-06-29)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:34:40 -08:00
Ævar Arnfjörð Bjarmason 345e216f63 builtin/merge.c: use fixed strings, not "strbuf", fix leak
Follow-up 465028e0e2 (merge: add missing strbuf_release(),
2021-10-07) and address the "msg" memory leak in this block. We could
free "&msg" before the "goto done" here, but even better is to avoid
allocating it in the first place.

By repeating the "Fast-forward" string here we can avoid using a
"struct strbuf" altogether.

Suggested-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:34:39 -08:00
Ævar Arnfjörð Bjarmason 9f24f3c719 worktree: fix a trivial leak in prune_worktrees()
We were leaking both the "struct strbuf" in prune_worktrees(), as well
as the "path" we got from should_prune_worktree(). Since these were
the only two uses of the "struct string_list" let's change it to a
"DUP" and push these to it with "string_list_append_nodup()".

For the string_list_append_nodup() we could also string_list_append()
the main_path.buf, and then strbuf_release(&main_path) right away. But
doing it this way avoids an allocation, as we already have the "struct
strbuf" prepared for appending to "kept".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:34:38 -08:00
Ævar Arnfjörð Bjarmason 90428ddccf repack: fix leaks on error with "goto cleanup"
In cmd_repack() when we hit an error, replace "return ret" with "goto
cleanup" to ensure we free the necessary data structures.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:34:37 -08:00
Ævar Arnfjörð Bjarmason 7615cf94d2 various: add missing clear_pathspec(), fix leaks
Fix memory leaks resulting from a missing clear_pathspec().

- archive.c: Plug a leak in the "struct archiver_args", and
  clear_pathspec() the "pathspec" member that the "parse_pathspec_arg()"
  call in this function populates.

- builtin/clean.c: Fix a memory leak that's been with us since
  893d839970 (clean: convert to use parse_pathspec, 2013-07-14).

- builtin/reset.c: Add clear_pathspec() calls to cmd_reset(),
  including to the codepaths where we'd return early.

- builtin/stash.c: Call clear_pathspec() on the pathspec initialized
  in push_stash().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:34:37 -08:00
Ævar Arnfjörð Bjarmason b2e5d75d17 tests: mark tests as passing with SANITIZE=leak
When the "ab/various-leak-fixes" topic was merged in [1] only t6021
would fail if the tests were run in the
"GIT_TEST_PASSING_SANITIZE_LEAK=check" mode, i.e. to check whether we
marked all leak-free tests with "TEST_PASSES_SANITIZE_LEAK=true".

Since then we've had various tests starting to pass under
SANITIZE=leak. Let's mark those as passing, this is when they started
to pass, narrowed down with "git bisect":

- t5317-pack-objects-filter-objects.sh: In
  faebba436e (list-objects-filter: plug pattern_list leak, 2022-12-01).

- t3210-pack-refs.sh, t5613-info-alternate.sh,
  t7403-submodule-sync.sh: In 189e97bc4b (diff: remove parseopts member
  from struct diff_options, 2022-12-01).

- t1408-packed-refs.sh: In ab91f6b7c4 (Merge branch
  'rs/diff-parseopts', 2022-12-19).

- t0023-crlf-am.sh, t4152-am-subjects.sh, t4254-am-corrupt.sh,
  t4256-am-format-flowed.sh, t4257-am-interactive.sh,
  t5403-post-checkout-hook.sh: In a658e881c1 (am: don't pass strvec to
  apply_parse_options(), 2022-12-13)

- t1301-shared-repo.sh, t1302-repo-version.sh: In b07a819c05 (reflog:
  clear leftovers in reflog_expiry_cleanup(), 2022-12-13).

- t1304-default-acl.sh, t1410-reflog.sh,
  t5330-no-lazy-fetch-with-commit-graph.sh, t5502-quickfetch.sh,
  t5604-clone-reference.sh, t6014-rev-list-all.sh,
  t7701-repack-unpack-unreachable.sh: In b0c61be320 (Merge branch
  'rs/reflog-expiry-cleanup', 2022-12-26)

- t3800-mktag.sh, t5302-pack-index.sh, t5306-pack-nobase.sh,
  t5573-pull-verify-signatures.sh, t7612-merge-verify-signatures.sh: In
  69bbbe484b (hash-object: use fsck for object checks, 2023-01-18).

- t1451-fsck-buffer.sh: In 8e4309038f (fsck: do not assume
  NUL-termination of buffers, 2023-01-19).

- t6501-freshen-objects.sh: In abf2bb895b (Merge branch
  'jk/hash-object-fsck', 2023-01-30)

1. 9ea1378d04 (Merge branch 'ab/various-leak-fixes', 2022-12-14)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:34:36 -08:00
Ævar Arnfjörð Bjarmason 9fdc79ecba tests: don't lose misc "git" exit codes
Fix a few miscellaneous cases where:

- We lost the "git" exit code via "git ... | grep"
- Likewise by having a $(git) argument to git itself
- Used "test -z" to check that a command emitted no output, we can use
  "test_must_be_empty" and &&-chaining instead.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:30:42 -08:00
Ævar Arnfjörð Bjarmason 4bd0785dc2 tests: don't lose exit status with "test <op> $(git ...)"
As with the preceding commit, rewrite tests that ran "git" inside
command substitution and lost the exit status of "git" so that we
notice the failing "git". This time around we're converting cases that
didn't involve a containing sub-shell around the command substitution.

In the case of "t0060-path-utils.sh" and
"t2005-checkout-index-symlinks.sh" convert the relevant code to using
the modern style of indentation and newline wrapping while having to
change it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:30:42 -08:00
Ævar Arnfjörð Bjarmason c7e03b4e39 tests: don't lose "git" exit codes in "! ( git ... | grep )"
Change tests that would lose the "git" exit code via a negation
pattern to:

- In the case of "t0055-beyond-symlinks.sh" compare against the
  expected output instead.

  We could use the same pattern as in the "t3700-add.sh" below, doing
  so would have the advantage that if we added an earlier test we
  wouldn't need to adjust the "expect" output.

  But as "t0055-beyond-symlinks.sh" is a small and focused test (less
  than 40 lines in total) let's use "test_cmp" instead.

- For "t3700-add.sh" use "sed -n" to print the expected "bad" part,
  and use "test_must_be_empty" to assert that it's not there. If we used
  "grep" we'd get a non-zero exit code.

  We could use "test_expect_code 1 grep", but this is more consistent
  with existing patterns in the test suite.

  We can also remove a repeated invocation of "git ls-files" for the
  last test that's being modified in that file, and search the
  existing "files" output instead.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:30:42 -08:00
Ævar Arnfjörð Bjarmason 0cd1a8818d tests: don't lose exit status with "(cd ...; test <op> $(git ...))"
Rewrite tests that ran "git" inside command substitution and lost the
exit status of "git" so that we notice the failing "git".

Have them use modern patterns such as a "test_cmp" of the expected
outputs instead.

We'll fix more of these these in the subsequent commit, for now we're
only converting the cases where this loss of exit code was combined
with spawning a sub-shell.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:30:41 -08:00
Ævar Arnfjörð Bjarmason 62f3a45bb4 t/lib-patch-mode.sh: fix ignored exit codes
Fix code added in b319ef70a9 (Add a small patch-mode testing library,
2009-08-13) to use &&-chaining.

This avoids losing both the exit code of a "git" and the "cat"
processes.

This fixes cases where we'd have e.g. missed memory leaks under
SANITIZE=leak, this code doesn't leak now as far as I can tell, but I
discovered it while looking at leaks in related code.

For "verify_saved_head()" we could make use of "test_cmp_rev" with
some changes, but it uses "git rev-parse --verify", and this existing
test does not. I think it could safely use it, but let's avoid the
while-at-it change, and narrowly fix the exit code problem.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:30:41 -08:00
Ævar Arnfjörð Bjarmason fb18dd2831 auto-crlf tests: don't lose exit code in loops and outside tests
Change the functions which are called from within
"test_expect_success" to add the "|| return 1" idiom to their
for-loops, so we won't lose the exit code of "cp", "git" etc.

Then for those setup functions that aren't called from a
"test_expect_success" we need to put the setup code in a
"test_expect_success" as well. It would not be enough to properly
&&-chain these, as the calling code is the top-level script itself. As
we don't run the tests with "set -e" we won't report failing commands
at the top-level.

The "checkout" part of this would miss memory leaks under
SANITIZE=leak, this code doesn't leak (the relevant "git checkout"
leak has been fixed), but in a past version of git we'd continue past
this failure under SANITIZE=leak when these invocations had errored
out, even under "--immediate".

For checkout_files() we could run one test_expect_success() instead of
the 5 we run now in a loop.

But as this function added in [1] is already taking pains to split up
its setup into phases (there are 5 more "test_expect_success()" at the
end of it already, see [1]), let's follow that existing convention.

1. 343151dcbd (t0027: combinations of core.autocrlf, core.eol and text, 2014-06-29)

Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:30:41 -08:00
Ævar Arnfjörð Bjarmason 20b813d7d3 add: remove "add.interactive.useBuiltin" & Perl "git add--interactive"
Since [1] first released with Git v2.37.0 the built-in version of "add
-i" has been the default. That built-in implementation was added in
[2], first released with Git v2.25.0.

At this point enough time has passed to allow for finding any
remaining bugs in this new implementation, so let's remove the
fallback code.

As with similar migrations for "stash"[3] and "rebase"[4] we're
keeping a mention of "add.interactive.useBuiltin" in the
documentation, but adding a warning() to notify any outstanding users
that the built-in is now the default. As with [5] and [6] we should
follow-up in the future and eventually remove that warning.

1. 0527ccb1b5 (add -i: default to the built-in implementation,
   2021-11-30)
2. f83dff60a7 (Start to implement a built-in version of `git add
   --interactive`, 2019-11-13)
3. 8a2cd3f512 (stash: remove the stash.useBuiltin setting,
   2020-03-03)
4. d03ebd411c (rebase: remove the rebase.useBuiltin setting,
   2019-03-18)
5. deeaf5ee07 (stash: remove documentation for `stash.useBuiltin`,
   2022-01-27)
6. 9bcde4d531 (rebase: remove transitory rebase.useBuiltin setting &
   env, 2021-03-23)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 15:03:34 -08:00
Kostya Farber d912a603ed t5000: modernise archive and :(glob) test
To match present day coding guiding codelines let's:

- use <<-EOF, so we can indent all lines to the
  the same level for this test

- use <<\EOF to notify the reader that no interpolation
  is expected in the body

Signed-off-by: Kostya Farber <kostya.farber@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 14:14:20 -08:00
Johannes Schindelin 3aef76ffd4 Sync with 2.38.4
* maint-2.38:
  Git 2.38.4
  Git 2.37.6
  Git 2.36.5
  Git 2.35.7
  Git 2.34.7
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:43:39 +01:00
Johannes Schindelin 6487e9c459 Sync with 2.37.6
* maint-2.37:
  Git 2.37.6
  Git 2.36.5
  Git 2.35.7
  Git 2.34.7
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:43:28 +01:00
Johannes Schindelin 16004682f9 Sync with 2.36.5
* maint-2.36:
  Git 2.36.5
  Git 2.35.7
  Git 2.34.7
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:38:31 +01:00
Johannes Schindelin 40843216c5 Sync with 2.35.7
* maint-2.35:
  Git 2.35.7
  Git 2.34.7
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:37:52 +01:00
Johannes Schindelin 6a53a59bf9 Sync with 2.34.7
* maint-2.34:
  Git 2.34.7
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:29:44 +01:00
Johannes Schindelin a7237f5ae9 Sync with 2.33.7
* maint-2.33:
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:29:16 +01:00
Johannes Schindelin 87248c5933 Sync with 2.32.6
* maint-2.32:
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:25:56 +01:00
Johannes Schindelin aeb93d7da2 Sync with 2.31.7
* maint-2.31:
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:25:08 +01:00
Johannes Schindelin e14d6b8408 Sync with 2.30.8
* maint-2.30:
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:24:06 +01:00
Junio C Hamano a3033a68ac Merge branch 'ps/apply-beyond-symlink' into maint-2.30
Fix a vulnerability (CVE-2023-23946) that allows crafted input to trick
`git apply` into writing files outside of the working tree.

* ps/apply-beyond-symlink:
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2023-02-06 09:12:16 +01:00
Junio C Hamano 2c6e5b32aa Merge branch 'en/rebase-incompatible-opts'
"git rebase" often ignored incompatible options instead of
complaining, which has been corrected.

* en/rebase-incompatible-opts:
  rebase: provide better error message for apply options vs. merge config
  rebase: put rebase_options initialization in single place
  rebase: fix formatting of rebase --reapply-cherry-picks option in docs
  rebase: clarify the OPT_CMDMODE incompatibilities
  rebase: add coverage of other incompatible options
  rebase: fix incompatiblity checks for --[no-]reapply-cherry-picks
  rebase: fix docs about incompatibilities with --root
  rebase: remove --allow-empty-message from incompatible opts
  rebase: flag --apply and --merge as incompatible
  rebase: mark --update-refs as requiring the merge backend
2023-02-03 16:08:21 -08:00
Patrick Steinhardt fade728df1 apply: fix writing behind newly created symbolic links
When writing files git-apply(1) initially makes sure that none of the
files it is about to create are behind a symlink:

```
 $ git init repo
 Initialized empty Git repository in /tmp/repo/.git/
 $ cd repo/
 $ ln -s dir symlink
 $ git apply - <<EOF
 diff --git a/symlink/file b/symlink/file
 new file mode 100644
 index 0000000..e69de29
 EOF
 error: affected file 'symlink/file' is beyond a symbolic link
```

This safety mechanism is crucial to ensure that we don't write outside
of the repository's working directory. It can be fooled though when the
patch that is being applied creates the symbolic link in the first
place, which can lead to writing files in arbitrary locations.

Fix this by checking whether the path we're about to create is
beyond a symlink or not. Tightening these checks like this should be
fine as we already have these precautions in Git as explained
above. Ideally, we should update the check we do up-front before
starting to reflect the computed changes to the working tree so that
we catch this case as well, but as part of embargoed security work,
adding an equivalent check just before we try to write out a file
should serve us well as a reasonable first step.

Digging back into history shows that this vulnerability has existed
since at least Git v2.9.0. As Git v2.8.0 and older don't build on my
system anymore I cannot tell whether older versions are affected, as
well.

Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-03 14:41:31 -08:00
Jeff King b08edf709d t/lib-httpd: increase ssl key size to 2048 bits
Recent versions of openssl will refuse to work with 1024-bit RSA keys,
as they are considered insecure. I didn't track down the exact version
in which the defaults were tightened, but the Debian-package openssl 3.0
on my system yields:

  $ LIB_HTTPD_SSL=1 ./t5551-http-fetch-smart.sh -v -i
  [...]
  SSL Library Error: error:0A00018F:SSL routines::ee key too small
  1..0 # SKIP web server setup failed

This could probably be overcome with configuration, but that's likely
to be a headache (especially if it requires touching /etc/openssl).
Let's just pick a key size that's less outrageously out of date.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-01 10:10:34 -08:00
Jeff King d113449e26 t/lib-httpd: drop SSLMutex config
The SSL config enabled by setting LIB_HTTPD_SSL does not work with
Apache versions greater than 2.2, as more recent versions complain about
the SSLMutex directive. According to
https://httpd.apache.org/docs/current/upgrading.html:

  Directives AcceptMutex, LockFile, RewriteLock, SSLMutex,
  SSLStaplingMutex, and WatchdogMutexPath have been replaced with a
  single Mutex directive. You will need to evaluate any use of these
  removed directives in your 2.2 configuration to determine if they can
  just be deleted or will need to be replaced using Mutex.

Deleting this line will just use the system default, which seems
sensible. The original came as part of faa4bc35a0 (http-push: add
regression tests, 2008-02-27), but no specific reason is given there (or
on the mailing list) for its presence.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-01 10:10:34 -08:00
Jeff King edd060dc84 t/lib-httpd: bump required apache version to 2.4
Apache 2.4 has been out since early 2012, almost 11 years. And its
predecessor, 2.2, has been out of support since its last release in
2017, over 5 years ago. The last mention on the mailing list was from
around the same time, in this thread:

  https://lore.kernel.org/git/20171231023234.21215-1-tmz@pobox.com/

We can probably assume that 2.4 is available everywhere. And the stakes
are fairly low, as the worst case is that such a platform would skip the
http tests.

This lets us clean up a few minor version checks in the config file, but
also revert f1f2b45be0 (tests: adjust the configuration for Apache 2.2,
2016-05-09). Its technique isn't _too_ bad, but certainly required a bit
more explanation than the 2.4 version it replaced. I manually confirmed
that the test in t5551 still behaves as expected (if you replace
"cadabra" with "foo", the server correctly rejects the request).

It will also help future patches which will no longer have to deal with
conditional config for this old version.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-01 10:10:34 -08:00
Jeff King d762617079 t/lib-httpd: bump required apache version to 2.2
Apache 2.2 was released in 2005, almost 18 years ago. We can probably
assume that people are running a version at least that old (and the
stakes for removing it are fairly low, as the worst case is that they
would not run the http tests against their ancient version).

Dropping support for the older versions cleans up the config file a
little, and will also enable us to bump the required version further
(with more cleanups) in a future patch.

Note that the file actually checks for version 2.1. In apache's
versioning scheme, odd numbered versions are for development and even
numbers are for stable releases. So 2.1 and 2.2 are effectively the same
from our perspective.

Older versions would just fail to start, which would generally cause us
to skip the tests. However, we do have version detection code in
lib-httpd.sh which produces a nicer error message, so let's update that,
too. I didn't bother handling the case of "3.0", etc. Apache has been on
2.x for 21 years, with no signs of bumping the major version.  And if
they eventually do, I suspect there will be enough breaking changes that
we'd need to update more than just the numeric version check. We can
worry about that hypothetical when it happens.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-01 10:10:34 -08:00
Derrick Stolee 026df9e047 bundle-uri: test missing bundles with heuristic
The creationToken heuristic uses a different mechanism for downloading
bundles from the "standard" approach. Specifically: it uses a concrete
order based on the creationToken values and attempts to download as few
bundles as possible. It also modifies local config to store a value for
future fetches to avoid downloading bundles, if possible.

However, if any of the individual bundles has a failed download, then
the logic for the ordering comes into question. It is important to avoid
infinite loops, assigning invalid creation token values in config, but
also to be opportunistic as possible when downloading as many bundles as
seem appropriate.

These tests were used to inform the implementation of
fetch_bundles_by_token() in bundle-uri.c, but are being added
independently here to allow focusing on faulty downloads. There may be
more cases that could be added that result in modifications to
fetch_bundles_by_token() as interesting data shapes reveal themselves in
real scenarios.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee c429bed102 bundle-uri: store fetch.bundleCreationToken
When a bundle list specifies the "creationToken" heuristic, the Git
client downloads the list and then starts downloading bundles in
descending creationToken order. This process stops as soon as all
downloaded bundles can be applied to the repository (because all
required commits are present in the repository or in the downloaded
bundles).

When checking the same bundle list twice, this strategy requires
downloading the bundle with the maximum creationToken again, which is
wasteful. The creationToken heuristic promises that the client will not
have a use for that bundle if its creationToken value is at most the
previous creationToken value.

To prevent these wasteful downloads, create a fetch.bundleCreationToken
config setting that the Git client sets after downloading bundles. This
value allows skipping that maximum bundle download when this config
value is the same value (or larger).

To test that this works correctly, we can insert some "duplicate"
fetches into existing tests and demonstrate that only the bundle list is
downloaded.

The previous logic for downloading bundles by creationToken worked even
if the bundle list was empty, but now we have logic that depends on the
first entry of the list. Terminate early in the (non-sensical) case of
an empty bundle list.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee 7f0cc04f2c fetch: fetch from an external bundle URI
When a user specifies a URI via 'git clone --bundle-uri', that URI may
be a bundle list that advertises a 'bundle.heuristic' value. In that
case, the Git client stores a 'fetch.bundleURI' config value storing
that URI.

Teach 'git fetch' to check for this config value and download bundles
from that URI before fetching from the Git remote(s). Likely, the bundle
provider has configured a heuristic (such as "creationToken") that will
allow the Git client to download only a portion of the bundles before
continuing the fetch.

Since this URI is completely independent of the remote server, we want
to be sure that we connect to the bundle URI before creating a
connection to the Git remote. We do not want to hold a stateful
connection for too long if we can avoid it.

To test that this works correctly, extend the previous tests that set
'fetch.bundleURI' to do follow-up fetches. The bundle list is updated
incrementally at each phase to demonstrate that the heuristic avoids
downloading older bundles. This includes the middle fetch downloading
the objects in bundle-3.bundle from the Git remote, and therefore not
needing that bundle in the third fetch.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee 4074d3c7e1 clone: set fetch.bundleURI if appropriate
Bundle providers may organize their bundle lists in a way that is
intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in
mind, or else the client will not expect to save time by downloading
bundles after the initial clone. This is done by specifying a
bundle.heuristic value.

There are two types of bundle lists: those at a static URI and those
that are advertised from a Git remote over protocol v2.

The new fetch.bundleURI config value applies for static bundle URIs that
are not advertised over protocol v2. If the user specifies a static URI
via 'git clone --bundle-uri', then Git can set this config as a reminder
for future 'git fetch' operations to check the bundle list before
connecting to the remote(s).

For lists provided over protocol v2, we will want to take a different
approach and create a property of the remote itself by creating a
remote.<id>.* type config key. That is not implemented in this change.

Later changes will update 'git fetch' to consume this option.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee 7903efb717 bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.

The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.

During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.

Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.

However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:

 ---------------- bundle-4

       4
      / \
 ----|---|------- bundle-3
     |   |
     |   3
     |   |
 ----|---|------- bundle-2
     |   |
     2   |
     |   |
 ----|---|------- bundle-1
      \ /
       1
       |
 (previous commits)

In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.

A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.

Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee 512fccf8a5 bundle-uri: parse bundle.<id>.creationToken values
The previous change taught Git to parse the bundle.heuristic value,
especially when its value is "creationToken". Now, teach Git to parse
the bundle.<id>.creationToken values on each bundle in a bundle list.

Before implementing any logic based on creationToken values for the
creationToken heuristic, parse and print these values for testing
purposes.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee c93c3d2fa4 bundle-uri: parse bundle.heuristic=creationToken
The bundle.heuristic value communicates that the bundle list is
organized to make use of the bundle.<id>.creationToken values that may
be provided in the bundle list. Those values will create a total order
on the bundles, allowing the Git client to download them in a specific
order and even remember previously-downloaded bundles by storing the
maximum creation token value.

Before implementing any logic that parses or uses the
bundle.<id>.creationToken values, teach Git to parse the
bundle.heuristic value from a bundle list. We can use 'test-tool
bundle-uri' to print the heuristic value and verify that the parsing
works correctly.

As an extra precaution, create the internal 'heuristics' array to be a
list of (enum, string) pairs so we can iterate through the array entries
carefully, regardless of the enum values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:48 -08:00
Derrick Stolee 7bc73e7b61 t5558: add tests for creationToken heuristic
As documented in the bundle URI design doc in 2da14fad8f (docs:
document bundle URI standard, 2022-08-09), the 'creationToken' member of
a bundle URI allows a bundle provider to specify a total order on the
bundles.

Future changes will allow the Git client to understand these members and
modify its behavior around downloading the bundles in that order. In the
meantime, create tests that add creation tokens to the bundle list. For
now, the Git client correctly ignores these unknown keys.

Create a new test helper function, test_remote_https_urls, which filters
GIT_TRACE2_EVENT output to extract a list of URLs passed to
git-remote-https child processes. This can be used to verify the order
of these requests as we implement the creationToken heuristic. For now,
we need to sort the actual output since the current client does not have
a well-defined order that it applies to the bundles.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:47 -08:00
Derrick Stolee d9fd674c8b bundle: verify using check_connected()
When Git verifies a bundle to see if it is safe for unbundling, it first
looks to see if the prerequisite commits are in the object store. This
is an easy way to "fail fast" but it is not a sufficient check for
updating refs that guarantee closure under reachability. There could
still be issues if those commits are not reachable from the repository's
references. The repository only has guarantees that its object store is
closed under reachability for the objects that are reachable from
references.

Thus, the code in verify_bundle() has previously had the additional
check that all prerequisite commits are reachable from repository
references. This is done via a revision walk from all references,
stopping only if all prerequisite commits are discovered or all commits
are walked. This uses a custom walk to verify_bundle().

This check is more strict than what Git applies to fetched pack-files.
In the fetch case, Git guarantees that the new references are closed
under reachability by walking from the new references until walking
commits that are reachable from repository refs. This is done through
the well-used check_connected() method.

To better align with the restrictions required by 'git fetch',
reimplement this check in verify_bundle() to use check_connected(). This
also simplifies the code significantly.

The previous change added a test that verified the behavior of 'git
bundle verify' and 'git bundle unbundle' in this case, and the error
messages looked like this:

  error: Could not read <missing-commit>
  fatal: Failed to traverse parents of commit <extant-commit>

However, by changing the revision walk slightly within check_connected()
and using its quiet mode, we can omit those messages. Instead, we get
only this message, tailored to describing the current state of the
repository:

  error: some prerequisite commits exist in the object store,
         but are not connected to the repository's history

(Line break added here for the commit message formatting, only.)

While this message does not include any object IDs, there is no
guarantee that those object IDs would help the user diagnose what is
going on, as they could be separated from the prerequisite commits by
some distance. At minimum, this situation describes the situation in a
more informative way than the previous error messages.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:47 -08:00
Derrick Stolee e72171f085 bundle: test unbundling with incomplete history
When verifying a bundle, Git checks first that all prerequisite commits
exist in the object store, then adds an additional check: those
prerequisite commits must be reachable from references in the
repository.

This check is stronger than what is checked for refs being added during
'git fetch', which simply guarantees that the new refs have a complete
history up to the point where it intersects with the current reachable
history.

However, we also do not have any tests that check the behavior under
this condition. Create a test that demonstrates its behavior.

In order to construct a broken history, perform a shallow clone of a
repository with a linear history, but whose default branch ('base') has
a single commit, so dropping the shallow markers leaves a complete
history from that reference. However, the 'tip' reference adds a
shallow commit whose parent is missing in the cloned repository. Trying
to unbundle a bundle with the 'tip' as a prerequisite will succeed past
the object store check and move into the reachability check.

The two errors that are reported are of this form:

  error: Could not read <missing-commit>
  fatal: Failed to traverse parents of commit <present-commit>

These messages are not particularly helpful for the person running the
unbundle command, but they do prevent the command from succeeding.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-31 08:57:47 -08:00
Junio C Hamano 777afaaa5c Merge branch 'tb/t0003-invoke-dd-more-portably'
Test portability fix.

* tb/t0003-invoke-dd-more-portably:
  t0003: call dd with portable blocksize
2023-01-30 14:24:23 -08:00
Junio C Hamano abf2bb895b Merge branch 'jk/hash-object-fsck'
"git hash-object" now checks that the resulting object is well
formed with the same code as "git fsck".

* jk/hash-object-fsck:
  fsck: do not assume NUL-termination of buffers
  hash-object: use fsck for object checks
  fsck: provide a function to fsck buffer without object struct
  t: use hash-object --literally when created malformed objects
  t7030: stop using invalid tag name
  t1006: stop using 0-padded timestamps
  t1007: modernize malformed object tests
2023-01-30 14:24:22 -08:00
Junio C Hamano 4ac326f64f Merge branch 'po/pretty-format-columns-doc'
Clarify column-padding operators in the pretty format string.

* po/pretty-format-columns-doc:
  doc: pretty-formats note wide char limitations, and add tests
  doc: pretty-formats describe use of ellipsis in truncation
  doc: pretty-formats document negative column alignments
  doc: pretty-formats: delineate `%<|(` parameter values
  doc: pretty-formats: separate parameters from placeholders
2023-01-30 14:24:22 -08:00
Derrick Stolee dea6308892 scalar: only warn when background maintenance fails
A user reported issues with 'scalar clone' and 'scalar register' when
working in an environment that had locked down the ability to run
'crontab' or 'systemctl' in that those commands registered as _failures_
instead of opportunistically reporting a success with just a warning
about background maintenance.

As a workaround, they can use GIT_TEST_MAINT_SCHEDULER to fake a
successful background maintenance, but this is not a viable strategy for
long-term.

Update 'scalar register' and 'scalar clone' to no longer fail by
modifying register_dir() to only warn when toggle_maintenance(1) fails.

Since background maintenance is a "nice to have" and not a requirement
for a working repository, it is best to move this from hard error to
gentle warning.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-27 12:38:26 -08:00
Derrick Stolee eeea9ae165 t921*: test scalar behavior starting maintenance
A user recently reported issues with 'scalar register' and 'scalar
clone' in that they failed when the system had permissions locked down
so both 'crontab' and 'systemctl' commands failed when trying to enable
background maintenance.

This hard error is undesirable, but let's create tests that demonstrate
this behavior before modiying the behavior. We can use
GIT_TEST_MAINT_SCHEDULER to guarantee failure and check the exit code
and error message.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-27 12:38:26 -08:00
Derrick Stolee 008217cb4a t: allow 'scalar' in test_must_fail
This will enable scalar tests to use the test_must_fail helper, when
necessary.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-27 12:38:26 -08:00
Junio C Hamano d26e26a3f5 Merge branch 'cw/fetch-remote-group-with-duplication'
"git fetch <group>", when "<group>" of remotes lists the same
remote twice, unnecessarily failed when parallel fetching was
enabled, which has been corrected.

* cw/fetch-remote-group-with-duplication:
  fetch: fix duplicate remote parallel fetch bug
2023-01-27 08:51:41 -08:00
Junio C Hamano 531d13d4d2 Merge branch 'km/send-email-with-v-reroll-count'
"git send-email -v 3" used to be expanded to "git send-email
--validate 3" when the user meant to pass them down to
"format-patch", which has been corrected.

* km/send-email-with-v-reroll-count:
  send-email: relay '-v N' to format-patch
2023-01-27 08:51:40 -08:00
Junio C Hamano 557d93a146 Merge branch 'cb/grep-pcre-ucp'
"grep -P" learned to use Unicode Character Property to grok
character classes when processing \b and \w etc.

* cb/grep-pcre-ucp:
  grep: correctly identify utf-8 characters with \{b,w} in -P
2023-01-27 08:51:40 -08:00
Elijah Newren eddfcd8ece rebase: provide better error message for apply options vs. merge config
When config which selects the merge backend (currently,
rebase.autosquash=true or rebase.updateRefs=true) conflicts with other
options on the command line (such as --whitespace=fix), make the error
message specifically call out the config option and specify how to
override that config option on the command line.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-25 09:20:53 -08:00
Elijah Newren 796abac7e1 rebase: add coverage of other incompatible options
The git-rebase manual noted several sets of incompatible options, but
we were missing tests for a few of these.  Further, we were missing
code checks for one of these, which could result in command line
options being silently ignored.

Also, note that adding a check for autosquash means that using
--whitespace=fix together with the config setting rebase.autosquash=true
will trigger an error.  A subsequent commit will improve the error
message.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-25 09:20:53 -08:00
Elijah Newren ffeaca177a rebase: fix incompatiblity checks for --[no-]reapply-cherry-picks
--[no-]reapply-cherry-picks was traditionally only supported by the
sequencer.  Support was added for the apply backend, when --keep-base is
also specified, in commit ce5238a690 ("rebase --keep-base: imply
--reapply-cherry-picks", 2022-10-17).  Make the code error out when
--[no-]reapply-cherry-picks is specified AND the apply backend is used
AND --keep-base is not specified.  Also, clarify a number of comments
surrounding the interaction of these flags.

Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-25 09:20:53 -08:00
Elijah Newren b8ad365640 rebase: fix docs about incompatibilities with --root
In commit 5dacd4abdd ("git-rebase.txt: document incompatible options",
2018-06-25), I added notes about incompatibilities between options for
the apply and merge backends.  Unfortunately, I inverted the condition
when --root was incompatible with the apply backend.  Fix the
documentation, and add a testcase that verifies the documentation
matches the code.

While at it, the documentation for --root also tried to cover some of
the backend differences between the apply and merge backends in relation
to reapplying cherry picks.  The information:
  * assumed that the apply backend was the default (it isn't anymore)
  * was written before --reapply-cherry-picks became an option
  * was written before the detailed information on backend differences
All of these factors make the sentence under --root about reapplying
cherry picks contradict information that is now available elsewhere in
the manual, and the other references are correct.  So just strike this
sentence.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-25 09:20:53 -08:00
Elijah Newren 7d718c552b rebase: flag --apply and --merge as incompatible
Previously, we flagged options which implied --apply as being
incompatible with options which implied --merge.  But if both options
were given explicitly, then we didn't flag the incompatibility.  The
same is true with --apply and --interactive.  Add the check, and add
some testcases to verify these are also caught.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-25 09:20:52 -08:00
Elijah Newren 1207599e83 rebase: mark --update-refs as requiring the merge backend
--update-refs is built in terms of the sequencer, which requires the
merge backend.  It was already marked as incompatible with the apply
backend in the git-rebase manual, but the code didn't check for this
incompatibility and warn the user.  Check and error now.

While at it, fix a typo in t3422...and fix some misleading wording
(most options which used to be am-specific have since been implemented
in the merge backend as well).

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-25 09:20:52 -08:00
Taylor Blau bffc762f87 dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
When using the dir_iterator API, we first stat(2) the base path, and
then use that as a starting point to enumerate the directory's contents.

If the directory contains symbolic links, we will immediately die() upon
encountering them without the `FOLLOW_SYMLINKS` flag. The same is not
true when resolving the top-level directory, though.

As explained in a previous commit, this oversight in 6f054f9fb3
(builtin/clone.c: disallow `--local` clones with symlinks, 2022-07-28)
can be used as an attack vector to include arbitrary files on a victim's
filesystem from outside of the repository.

Prevent resolving top-level symlinks unless the FOLLOW_SYMLINKS flag is
given, which will cause clones of a repository with a symlink'd
"$GIT_DIR/objects" directory to fail.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-24 16:52:16 -08:00
Taylor Blau cf8f6ce02a clone: delay picking a transport until after get_repo_path()
In the previous commit, t5619 demonstrates an issue where two calls to
`get_repo_path()` could trick Git into using its local clone mechanism
in conjunction with a non-local transport.

That sequence is:

 - the starting state is that the local path https:/example.com/foo is a
   symlink that points to ../../../.git/modules/foo. So it's dangling.

 - get_repo_path() sees that no such path exists (because it's
   dangling), and thus we do not canonicalize it into an absolute path

 - because we're using --separate-git-dir, we create .git/modules/foo.
   Now our symlink is no longer dangling!

 - we pass the url to transport_get(), which sees it as an https URL.

 - we call get_repo_path() again, on the url. This second call was
   introduced by f38aa83f9a (use local cloning if insteadOf makes a
   local URL, 2014-07-17). The idea is that we want to pull the url
   fresh from the remote.c API, because it will apply any aliases.

And of course now it sees that there is a local file, which is a
mismatch with the transport we already selected.

The issue in the above sequence is calling `transport_get()` before
deciding whether or not the repository is indeed local, and not passing
in an absolute path if it is local.

This is reminiscent of a similar bug report in [1], where it was
suggested to perform the `insteadOf` lookup earlier. Taking that
approach may not be as straightforward, since the intent is to store the
original URL in the config, but to actually fetch from the insteadOf
one, so conflating the two early on is a non-starter.

Note: we pass the path returned by `get_repo_path(remote->url[0])`,
which should be the same as `repo_name` (aside from any `insteadOf`
rewrites).

We *could* pass `absolute_pathdup()` of the same argument, which
86521acaca (Bring local clone's origin URL in line with that of a remote
clone, 2008-09-01) indicates may differ depending on the presence of
".git/" for a non-bare repo. That matters for forming relative submodule
paths, but doesn't matter for the second call, since we're just feeding
it to the transport code, which is fine either way.

[1]: https://lore.kernel.org/git/CAMoD=Bi41mB3QRn3JdZL-FGHs4w3C2jGpnJB-CqSndO7FMtfzA@mail.gmail.com/

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-24 16:52:16 -08:00
Taylor Blau 58325b93c5 t5619: demonstrate clone_local() with ambiguous transport
When cloning a repository, Git must determine (a) what transport
mechanism to use, and (b) whether or not the clone is local.

Since f38aa83f9a (use local cloning if insteadOf makes a local URL,
2014-07-17), the latter check happens after the remote has been
initialized, and references the remote's URL instead of the local path.
This is done to make it possible for a `url.<base>.insteadOf` rule to
convert a remote URL into a local one, in which case the `clone_local()`
mechanism should be used.

However, with a specially crafted repository, Git can be tricked into
using a non-local transport while still setting `is_local` to "1" and
using the `clone_local()` optimization. The below test case
demonstrates such an instance, and shows that it can be used to include
arbitrary (known) paths in the working copy of a cloned repository on a
victim's machine[^1], even if local file clones are forbidden by
`protocol.file.allow`.

This happens in a few parts:

 1. We first call `get_repo_path()` to see if the remote is a local
    path. If it is, we replace the repo name with its absolute path.

 2. We then call `transport_get()` on the repo name and decide how to
    access it. If it was turned into an absolute path in the previous
    step, then we should always treat it like a file.

 3. We use `get_repo_path()` again, and set `is_local` as appropriate.
    But it's already too late to rewrite the repo name as an absolute
    path, since we've already fed it to the transport code.

The attack works by including a submodule whose URL corresponds to a
path on disk. In the below example, the repository "sub" is reachable
via the dumb HTTP protocol at (something like):

    http://127.0.0.1:NNNN/dumb/sub.git

However, the path "http:/127.0.0.1:NNNN/dumb" (that is, a top-level
directory called "http:", then nested directories "127.0.0.1:NNNN", and
"dumb") exists within the repository, too.

To determine this, it first picks the appropriate transport, which is
dumb HTTP. It then uses the remote's URL in order to determine whether
the repository exists locally on disk. However, the malicious repository
also contains an embedded stub repository which is the target of a
symbolic link at the local path corresponding to the "sub" repository on
disk (i.e., there is a symbolic link at "http:/127.0.0.1/dumb/sub.git",
pointing to the stub repository via ".git/modules/sub/../../../repo").

This stub repository fools Git into thinking that a local repository
exists at that URL and thus can be cloned locally. The affected call is
in `get_repo_path()`, which in turn calls `get_repo_path_1()`, which
locates a valid repository at that target.

This then causes Git to set the `is_local` variable to "1", and in turn
instructs Git to clone the repository using its local clone optimization
via the `clone_local()` function.

The exploit comes into play because the stub repository's top-level
"$GIT_DIR/objects" directory is a symbolic link which can point to an
arbitrary path on the victim's machine. `clone_local()` resolves the
top-level "objects" directory through a `stat(2)` call, meaning that we
read through the symbolic link and copy or hardlink the directory
contents at the destination of the link.

In other words, we can get steps (1) and (3) to disagree by leveraging
the dangling symlink to pick a non-local transport in the first step,
and then set is_local to "1" in the third step when cloning with
`--separate-git-dir`, which makes the symlink non-dangling.

This can result in data-exfiltration on the victim's machine when
sensitive data is at a known path (e.g., "/home/$USER/.ssh").

The appropriate fix is two-fold:

 - Resolve the transport later on (to avoid using the local
   clone optimization with a non-local transport).

 - Avoid reading through the top-level "objects" directory when
   (correctly) using the clone_local() optimization.

This patch merely demonstrates the issue. The following two patches will
implement each part of the above fix, respectively.

[^1]: Provided that any target directory does not contain symbolic
  links, in which case the changes from 6f054f9fb3 (builtin/clone.c:
  disallow `--local` clones with symlinks, 2022-07-28) will abort the
  clone.

Reported-by: yvvdwf <yvvdwf@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-24 16:52:16 -08:00
Junio C Hamano ebed06a3e9 Merge branch 'zh/scalar-progress'
"scalar" learned to give progress bar.

* zh/scalar-progress:
  scalar: show progress if stderr refers to a terminal
2023-01-23 13:39:52 -08:00
Junio C Hamano 5287319bf8 Merge branch 'ds/omit-trailing-hash-in-index'
Quickfix for a topic already in 'master'.

* ds/omit-trailing-hash-in-index:
  t1600: fix racy index.skipHash test
2023-01-23 13:39:52 -08:00
Junio C Hamano cd37c45acf Merge branch 'ab/test-env-helper'
Remove "git env--helper" and demote it to a test-tool subcommand.

* ab/test-env-helper:
  env-helper: move this built-in to "test-tool env-helper"
2023-01-23 13:39:51 -08:00
Junio C Hamano 577bff3a81 Merge branch 'kn/attr-from-tree'
"git check-attr" learned to take an optional tree-ish to read the
.gitattributes file from.

* kn/attr-from-tree:
  attr: add flag `--source` to work with tree-ish
  t0003: move setup for `--all` into new block
2023-01-23 13:39:51 -08:00
Junio C Hamano 8a40af9cab Merge branch 'rs/ls-tree-path-expansion-fix'
"git ls-tree --format='%(path) %(path)' $tree $path" showed the
path three times, which has been corrected.

* rs/ls-tree-path-expansion-fix:
  ls-tree: remove dead store and strbuf for quote_c_style()
  ls-tree: fix expansion of repeated %(path)
2023-01-23 13:39:50 -08:00
Junio C Hamano b269563512 Merge branch 'en/t6426-todo-cleanup'
Test clean-up.

* en/t6426-todo-cleanup:
  t6426: fix TODO about making test more comprehensive
2023-01-23 13:39:50 -08:00
Rubén Justo 7fb89047cc bisect: fix "reset" when branch is checked out elsewhere
Since 1d0fa89 (checkout: add --ignore-other-wortrees, 2015-01-03) we
have a safety valve in checkout/switch to prevent the same branch from
being checked out simultaneously in multiple worktrees.

If a branch is bisected in a worktree while also being checked out in
another worktree; when the bisection is finished, checking out the
branch back in the current worktree may fail.

Let's teach bisect to use the "--ignore-other-worktrees" flag.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-22 09:23:11 -08:00
Torsten Bögershausen 5458ba0a4d t0003: call dd with portable blocksize
The command `dd bs=101M count=1` is not portable,
e.g. dd shipped with MacOs does not understand the 'M'.

Use `dd bs=1048576 count=101`, which achives the same, instead.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-22 08:14:40 -08:00
Junio C Hamano 30b4e5c888 Merge branch 'ab/bisect-cleanup'
Code clean-up.

* ab/bisect-cleanup:
  bisect: no longer try to clean up left-over `.git/head-name` files
  bisect: remove Cogito-related code
  bisect run: fix the error message
  bisect: verify that a bogus option won't try to start a bisection
  bisect--helper: make the order consistently `argc, argv`
  bisect--helper: simplify exit code computation
2023-01-21 17:22:01 -08:00
Junio C Hamano 38a49aba90 Merge branch 'tl/ls-tree-code-clean-up'
Code clean-up.

* tl/ls-tree-code-clean-up:
  t3104: remove shift code in 'test_ls_tree_format'
  ls-tree: cleanup the redundant SPACE
  ls-tree: make "line_termination" less generic
  ls-tree: fold "show_tree_data" into "cb" struct
  ls-tree: use a "struct options"
  ls-tree: don't use "show_tree_data" for "fast" callbacks
2023-01-21 17:22:00 -08:00
Junio C Hamano d2917b9099 Merge branch 'ph/parse-date-reduced-precision'
Loosen date parsing heuristics.

* ph/parse-date-reduced-precision:
  date.c: allow ISO 8601 reduced precision times
2023-01-21 17:22:00 -08:00
Junio C Hamano 42423c61d9 Merge branch 'jk/interop-error'
Test helper improvement.

* jk/interop-error:
  t/interop: report which vanilla git command failed
2023-01-21 17:21:59 -08:00
Junio C Hamano 013f168211 Merge branch 'ar/test-cleanup'
Test clean-up.

* ar/test-cleanup:
  t7527: use test_when_finished in 'case insensitive+preserving'
  t6422: drop commented out code
  t6003: uncomment test '--max-age=c3, --topo-order'
2023-01-21 17:21:59 -08:00
Junio C Hamano 90c47b3fba Merge branch 'jx/t1301-updates'
Test updates.

* jx/t1301-updates:
  t1301: do not change $CWD in "shared=all" test case
  t1301: use test_when_finished for cleanup
  t1301: fix wrong template dir for git-init
2023-01-21 17:21:58 -08:00
Jeff King 8e4309038f fsck: do not assume NUL-termination of buffers
The fsck code operates on an object buffer represented as a pointer/len
combination. However, the parsing of commits and tags is a little bit
loose; we mostly scan left-to-right through the buffer, without checking
whether we've gone past the length we were given.

This has traditionally been OK because the buffers we feed to fsck
always have an extra NUL after the end of the object content, which ends
any left-to-right scan. That has always been true for objects we read
from the odb, and we made it true for incoming index-pack/unpack-objects
checks in a1e920a0a7 (index-pack: terminate object buffers with NUL,
2014-12-08).

However, we recently added an exception: hash-object asks index_fd() to
do fsck checks. That _may_ have an extra NUL (if we read from a pipe
into a strbuf), but it might not (if we read the contents from the
file). Nor can we just teach it to always add a NUL. We may mmap the
on-disk file, which will not have any extra bytes (if it's a multiple of
the page size). Not to mention that this is a rather subtle assumption
for the fsck code to make.

Instead, let's make sure that the fsck parsers don't ever look past the
size of the buffer they've been given. This _almost_ works already,
thanks to earlier work in 4d0d89755e (Make sure fsck_commit_buffer()
does not run out of the buffer, 2014-09-11). The theory there is that we
check up front whether we have the end of header double-newline
separator. And then any left-to-right scanning we do is OK as long as it
stops when it hits that boundary.

However, we later softened that in 84d18c0bcf (fsck: it is OK for a tag
and a commit to lack the body, 2015-06-28), which allows the
double-newline header to be missing, but does require that the header
ends in a newline. That was OK back then, because of the NUL-termination
guarantees (including the one from a1e920a0a7 mentioned above).

Because 84d18c0bcf guarantees that any header line does end in a
newline, we are still OK with most of the left-to-right scanning. We
only need to take care after completing a line, to check that there is
another line (and we didn't run out of buffer).

Most of these checks are just need to check "buffer < buffer_end" (where
buffer is advanced as we parse) before scanning for the next header
line. But here are a few notes:

  - we don't technically need to check for remaining buffer before
    parsing the very first line ("tree" for a commit, or "object" for a
    tag), because verify_headers() rejects a totally empty buffer. But
    we'll do so in the name of consistency and defensiveness.

  - there are some calls to strchr('\n'). These are actually OK by the
    "the final header line must end in a newline" guarantee from
    verify_headers(). They will always find that rather than run off the
    end of the buffer. Curiously, they do check for a NULL return and
    complain, but I believe that condition can never be reached.

    However, I converted them to use memchr() with a proper size and
    retained the NULL checks. Using memchr() is not much longer and
    makes it more obvious what is going on. Likewise, retaining the NULL
    checks serves as a defensive measure in case my analysis is wrong.

  - commit 9a1a3a4d4c (mktag: allow omitting the header/body \n
    separator, 2021-01-05), does check for the end-of-buffer condition,
    but does so with "!*buffer", relying explicitly on the NUL
    termination. We can accomplish the same thing with a pointer
    comparison. I also folded it into the follow-on conditional that
    checks the contents of the buffer, for consistency with the other
    checks.

  - fsck_ident() uses parse_timestamp(), which is based on strtoumax().
    That function will happily skip past leading whitespace, including
    newlines, which makes it a risk. We can fix this by scanning to the
    first digit ourselves, and then using parse_timestamp() to do the
    actual numeric conversion.

    Note that as a side effect this fixes the fact that we missed
    zero-padded timestamps like "<email>   0123" (whereas we would
    complain about "<email> 0123"). I doubt anybody cares, but I
    mention it here for completeness.

  - fsck_tree() does not need any modifications. It relies on
    decode_tree_entry() to do the actual parsing, and that function
    checks both that there are enough bytes in the buffer to represent
    an entry, and that there is a NUL at the appropriate spot (one
    hash-length from the end; this may not be the NUL for the entry we
    are parsing, but we know that in the worst case, everything from our
    current position to that NUL is a filename, so we won't run out of
    bytes).

In addition to fixing the code itself, we'd like to make sure our rather
subtle assumptions are not violated in the future. So this patch does
two more things:

  - add comments around verify_headers() documenting the link between
    what it checks and the memory safety of the callers. I don't expect
    this code to be modified frequently, but this may help somebody from
    accidentally breaking things.

  - add a thorough set of tests covering truncations at various key
    spots (e.g., for a "tree $oid" line, in the middle of the word
    "tree", right after it, after the space, in the middle of the $oid,
    and right at the end of the line. Most of these are fine already (it
    is only truncating right at the end of the line that is currently
    broken). And some of them are not even possible with the current
    code (we parse "tree " as a unit, so truncating before the space is
    equivalent). But I aimed here to consider the code a black box and
    look for any truncations that would be a problem for a left-to-right
    parser.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-19 15:39:43 -08:00
Calvin Wan 06a668cb90 fetch: fix duplicate remote parallel fetch bug
Fetching in parallel from a remote group with a duplicated remote results
in the following:

error: cannot lock ref '<ref>': is at <oid> but expected <oid>

This doesn't happen in serial since fetching from the same remote that
has already been fetched from is a noop. Therefore, remove any duplicated
remotes after remote groups are parsed.

Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-19 14:41:48 -08:00
Philip Oakley 540e7bc477 doc: pretty-formats note wide char limitations, and add tests
The previous commits added clarifications to the column alignment
placeholders, note that the spaces are optional around the parameters.

Also, a proposed extension [1] to allow hard truncation (without
ellipsis '..') highlighted that the existing code does not play well
with wide characters, such as Asian fonts and emojis.

For example, N wide characters take 2N columns so won't fit an odd number
column width, causing misalignment somewhere.

Further analysis also showed that decomposed characters, e.g. separate
`a` + `umlaut` Unicode code-points may also be mis-counted, in some cases
leaving multiple loose `umlauts` all combined together.

Add some notes about these limitations, and add basic tests to demonstrate
them.

The chosen solution for the tests is to substitute any wide character
that overlaps a splitting boundary for the unicode vertical ellipsis
code point as a rare but 'obvious' substitution.

An alternative could be the substitution with a single dot '.' which
matches regular expression usage, and our two dot ellipsis, and further
in scenarios where the bulk of the text is wide characters, would be
obvious. In mainly 'ascii' scenarios a singleton emoji being substituted
by a dot could be confusing.

It is enough that the tests fail cleanly. The final choice for the
substitute character can be deferred.

[1]
https://lore.kernel.org/git/20221030185614.3842-1-philipoakley@iee.email/

Signed-off-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-19 14:35:15 -08:00
Carlo Marcelo Arenas Belón acabd2048e grep: correctly identify utf-8 characters with \{b,w} in -P
When UTF is enabled for a PCRE match, the corresponding flags are
added to the pcre2_compile() call, but PCRE2_UCP wasn't included.

This prevents extending the meaning of the character classes to
include those new valid characters and therefore result in failed
matches for expressions that rely on that extention, for ex:

  $ git grep -P '\bÆvar'

Add PCRE2_UCP so that \w will include Æ and therefore \b could
correctly match the beginning of that word.

This has an impact on performance that has been estimated to be
between 20% to 40% and that is shown through the added performance
test.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-18 15:24:52 -08:00
Jeff King 69bbbe484b hash-object: use fsck for object checks
Since c879daa237 (Make hash-object more robust against malformed
objects, 2011-02-05), we've done some rudimentary checks against objects
we're about to write by running them through our usual parsers for
trees, commits, and tags.

These parsers catch some problems, but they are not nearly as careful as
the fsck functions (which make sense; the parsers are designed to be
fast and forgiving, bailing only when the input is unintelligible). We
are better off doing the more thorough fsck checks when writing objects.
Doing so at write time is much better than writing garbage only to find
out later (after building more history atop it!) that fsck complains
about it, or hosts with transfer.fsckObjects reject it.

This is obviously going to be a user-visible behavior change, and the
test changes earlier in this series show the scope of the impact. But
I'd argue that this is OK:

  - the documentation for hash-object is already vague about which
    checks we might do, saying that --literally will allow "any
    garbage[...] which might not otherwise pass standard object parsing
    or git-fsck checks". So we are already covered under the documented
    behavior.

  - users don't generally run hash-object anyway. There are a lot of
    spots in the tests that needed to be updated because creating
    garbage objects is something that Git's tests disproportionately do.

  - it's hard to imagine anyone thinking the new behavior is worse. Any
    object we reject would be a potential problem down the road for the
    user. And if they really want to create garbage, --literally is
    already the escape hatch they need.

Note that the change here is actually in index_mem(), which handles the
HASH_FORMAT_CHECK flag passed by hash-object. That flag is also used by
"git-replace --edit" to sanity-check the result. Covering that with more
thorough checks likewise seems like a good thing.

Besides being more thorough, there are a few other bonuses:

  - we get rid of some questionable stack allocations of object structs.
    These don't seem to currently cause any problems in practice, but
    they subtly violate some of the assumptions made by the rest of the
    code (e.g., the "struct commit" we put on the stack and
    zero-initialize will not have a proper index from
    alloc_comit_index().

  - likewise, those parsed object structs are the source of some small
    memory leaks

  - the resulting messages are much better. For example:

      [before]
      $ echo 'tree 123' | git hash-object -t commit --stdin
      error: bogus commit object 0000000000000000000000000000000000000000
      fatal: corrupt commit

      [after]
      $ echo 'tree 123' | git.compile hash-object -t commit --stdin
      error: object fails fsck: badTreeSha1: invalid 'tree' line format - bad sha1
      fatal: refusing to create malformed object

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-18 12:59:45 -08:00
Jeff King 34959d80db t: use hash-object --literally when created malformed objects
Many test scripts use hash-object to create malformed objects to see how
we handle the results in various commands. In some cases we already have
to use "hash-object --literally", because it does some rudimentary
quality checks. But let's use "--literally" more consistently to
future-proof these tests against hash-object learning to be more
careful.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-18 12:59:44 -08:00
Jeff King ad5dfeac04 t7030: stop using invalid tag name
We intentionally invalidate the signature of a tag by switching its tag
name from "seventh" to "7th forged". However, the latter is not a valid
tag name because it contains a space. This doesn't currently affect the
test, but we're better off using something syntactically valid. That
reduces the number of possible failure modes in the test, and
future-proofs us if git hash-object gets more picky about its input.

The t7031 script, which was mostly copied from t7030, has the same
problem, so we'll fix it, too.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-18 12:59:44 -08:00
Jeff King 61cc4be7ec t1006: stop using 0-padded timestamps
The fake objects in t1006 use dummy timestamps like "0000000000 +0000".
While this does make them look more like normal timestamps (which,
unless it is 1970, have many digits), it actually violates our fsck
checks, which complain about zero-padded timestamps.

This doesn't currently break anything, but let's future-proof our tests
against a version of hash-object which is a little more careful about
its input. We don't actually care about the exact values here (and in
fact, the helper functions in this script end up removing the timestamps
anyway, so we don't even have to adjust other parts of the tests).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-18 12:59:44 -08:00
Jeff King 6e2646075c t1007: modernize malformed object tests
The tests in t1007 for detecting malformed objects have two
anachronisms:

 - they use "sha1" instead of "oid" in variable names, even though the
   script as a whole has been adapted to handle sha256

 - they use test_i18ngrep, which is no longer necessary

Since we'll be adding a new similar test, let's clean these up so they
are all consistently using the modern style.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-18 12:59:44 -08:00
Derrick Stolee 42ea7a4150 t1600: fix racy index.skipHash test
The test 1600.6 can fail under --stress due to mtime collisions. Most of
the tests include a removal of the index file to guarantee that the
index is updated. However, the submodule test addded in ee1f0c242e
(read-cache: add index.skipHash config option, 2023-01-06) did not
include this removal. Thus, on rare occasions, the test can fail because
the index still has a non-null trailing hash, as detected by the helper
added in da9acde14e (test-lib-functions: add helper for trailing hash,
2023-01-06).

By removing the submodule's index before the 'git -C sub add a' command,
we guarantee that the index is rewritten with the new index.skipHash
config option.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-17 07:41:44 -08:00
Junio C Hamano 508386c6c5 Sync with 2.39.1 2023-01-16 12:11:58 -08:00
Junio C Hamano 3ed618f28f Merge branch 'ar/dup-words-fixes'
Typofixes.

* ar/dup-words-fixes:
  *: fix typos which duplicate a word
2023-01-16 12:07:47 -08:00
Junio C Hamano ffd9238685 Merge branch 'ds/omit-trailing-hash-in-index'
Introduce an optional configuration to allow the trailing hash that
protects the index file from bit flipping.

* ds/omit-trailing-hash-in-index:
  features: feature.manyFiles implies fast index writes
  test-lib-functions: add helper for trailing hash
  read-cache: add index.skipHash config option
  hashfile: allow skipping the hash function
2023-01-16 12:07:47 -08:00
Junio C Hamano ab85a7de6d Merge branch 'ws/single-file-cone'
The logic to see if we are using the "cone" mode by checking the
sparsity patterns has been tightened to avoid mistaking a pattern
that names a single file as specifying a cone.

* ws/single-file-cone:
  dir: check for single file cone patterns
2023-01-16 12:07:47 -08:00
Junio C Hamano 1120c54c12 Merge branch 'jk/ext-diff-with-relative'
"git diff --relative" did not mix well with "git diff --ext-diff",
which has been corrected.

* jk/ext-diff-with-relative:
  diff: drop "name" parameter from prepare_temp_file()
  diff: clean up external-diff argv setup
  diff: use filespec path to set up tempfiles for ext-diff
2023-01-16 12:07:46 -08:00
Junio C Hamano af8a3bb853 Merge branch 'ds/bundle-uri-4'
Code clean-up.

* ds/bundle-uri-4:
  test-bundle-uri: drop unused variables
2023-01-16 12:07:46 -08:00
Junio C Hamano b242e89dff Merge branch 'tr/am--no-verify'
Conditionally skip the pre-applypatch and applypatch-msg hooks when
applying patches with 'git am'.

* tr/am--no-verify:
  am: allow passing --no-verify flag
2023-01-16 12:07:46 -08:00
Junio C Hamano 7c7357910b Merge branch 'es/t1509-root-fixes'
Test fixes.

* es/t1509-root-fixes:
  t1509: facilitate repeated script invocations
  t1509: make "setup" test more robust
  t1509: fix failing "root work tree" test due to owner-check
2023-01-16 12:07:45 -08:00
ZheNing Hu 4433bd24e4 scalar: show progress if stderr refers to a terminal
Sometimes when users use scalar to download a monorepo with a long
commit history, they want to check the progress bar to know how long
they still need to wait during the fetch process, but scalar
suppresses this output by default.

So let's check whether scalar stderr refer to a terminal, if so,
show progress, otherwise disable it.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-16 10:42:22 -08:00
René Scharfe 16fb5c54bd ls-tree: fix expansion of repeated %(path)
expand_show_tree() borrows the base strbuf given to us by read_tree() to
build the full path of the current entry when handling %(path).  Only
its indirect caller, show_tree_fmt(), removes the added entry name.
That works fine as long as %(path) is only included once in the format
string, but accumulates duplicates if it's repeated:

   $ git ls-tree --format='%(path) %(path) %(path)' HEAD M*
   Makefile MakefileMakefile MakefileMakefileMakefile

Reset the length after each use to get the same expansion every time;
here's the behavior with this patch:

   $ ./git ls-tree --format='%(path) %(path) %(path)' HEAD M*
   Makefile Makefile Makefile

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-14 19:22:26 -08:00
Elijah Newren dcb47e52b0 t6426: fix TODO about making test more comprehensive
t6426.7 (a rename/add testcase) long had a TODO/FIXME comment about
how the test could be improved (with some commented out sample code
that had a few small errors), but those improvements were blocked on
other changes still in progress.  The necessary changes were put in
place years ago but the comment was forgotten.  Remove and fix the
commented out code section and finally remove the big TODO/FIXME
comment.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-14 18:28:56 -08:00
Ævar Arnfjörð Bjarmason 4a1baacd46 env-helper: move this built-in to "test-tool env-helper"
Since [1] there has been no reason for keeping "git env--helper" a
built-in. The reason it was a built-in to begin with was to support
the GIT_TEST_GETTEXT_POISON mode removed in that commit. I.e. unlike
the rest of "test-tool" it would potentially be called by the
installed git via "git-sh-i18n.sh".

As none of that applies since [1] we should stop carrying this
technical debt, and move it to t/helper/*. As this mostly move-only
change shows this has the nice bonus that we'll stop wasting time
translating the internal-only strings it emits.

Even though this was a built-in, it was intentionally never
documented, see its introduction in [2]. It never saw use outside of
the test suite, except for the "GIT_TEST_GETTEXT_POISON" use-case
noted above.

1. d162b25f95 (tests: remove support for GIT_TEST_GETTEXT_POISON,
   2021-01-20)
2. b4f207f339 (env--helper: new undocumented builtin wrapping
   git_env_*(), 2019-06-21)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-14 18:07:11 -08:00
Karthik Nayak 47cfc9bd7d attr: add flag --source to work with tree-ish
The contents of the .gitattributes files may evolve over time, but "git
check-attr" always checks attributes against them in the working tree
and/or in the index. It may be beneficial to optionally allow the users
to check attributes taken from a commit other than HEAD against paths.

Add a new flag `--source` which will allow users to check the
attributes against a commit (actually any tree-ish would do). When the
user uses this flag, we go through the stack of .gitattributes files but
instead of checking the current working tree and/or in the index, we
check the blobs from the provided tree-ish object. This allows the
command to also be used in bare repositories.

Since we use a tree-ish object, the user can pass "--source
HEAD:subdirectory" and all the attributes will be looked up as if
subdirectory was the root directory of the repository.

We cannot simply use the `<rev>:<path>` syntax without the `--source`
flag, similar to how it is used in `git show` because any non-flag
parameter before `--` is treated as an attribute and any parameter after
`--` is treated as a pathname.

The change involves creating a new function `read_attr_from_blob`, which
given the path reads the blob for the path against the provided source and
parses the attributes line by line. This function is plugged into
`read_attr()` function wherein we go through the stack of attributes
files.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Co-authored-by: toon@iotcl.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-14 08:49:55 -08:00
Karthik Nayak c847e8c228 t0003: move setup for --all into new block
There is some setup code which is used by multiple tests being setup in
`attribute test: --all option`. This means when we run "sh
./t0003-attributes.sh --run=setup,<num>" there is a chance of failing
since we missed this setup block.

So to ensure that setups are independent of test logic, move this to a
new setup block.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Co-authored-by: toon@iotcl.com
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-14 08:49:55 -08:00
Teng Long cf4936ed74 t3104: remove shift code in 'test_ls_tree_format'
In t3104-ls-tree-format.sh, There is a legacy 'shift 2' code
and the relevant code block no longer depends on it anymore,
so let's remove it for a small cleanup.

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13 15:09:23 -08:00
Johannes Schindelin de54b5fec4 bisect: no longer try to clean up left-over .git/head-name files
As per the code comment, the `.git/head-name` files were cleaned up for
backwards-compatibility: an old version of `git bisect` could have left
them behind.

Now, just how old would such a version be? As of 0f497e75f0 (Eliminate
confusing "won't bisect on seeked tree" failure, 2008-02-23), `git
bisect` does not write that file anymore. Which corresponds to Git
v1.5.4.4.

Even if the likelihood is non-nil that there might still be users out
there who use such an old version to start a bisection, but then decide
to continue bisecting with a current Git version, it is highly
improbable.

So let's remove that code, at long last.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13 14:17:14 -08:00
Johannes Schindelin 4de06fbd56 bisect run: fix the error message
In d1bbbe45df (bisect--helper: reimplement `bisect_run` shell function
in C, 2021-09-13), we ported the `bisect run` subcommand to C, including
the part that prints out an error message when the implicit `git bisect
bad` or `git bisect good` failed.

However, the error message was supposed to print out whether the state
was "good" or "bad", but used a bogus (because non-populated) `args`
variable for it. This was fixed in [1], but as of [2] (when
`bisect--helper` was changed to the present `bisect-state') the error
message still talks about implementation details that should not
concern end users.

Fix that, and add a regression test to ensure that the intended form of
the error message.

1. 80c2e9657f (bisect--helper: report actual bisect_state() argument
   on error, 2022-01-18
2. f37d0bdd42 (bisect: fix output regressions in v2.30.0, 2022-11-10)

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13 14:17:14 -08:00
Johannes Schindelin 2f645b33ba bisect: verify that a bogus option won't try to start a bisection
We do not want `git bisect --bogus-option` to start a bisection. To
verify that, we look for the tell-tale error message `You need to start
by "git bisect start"` and fail if it was found.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13 14:17:14 -08:00
Andrei Rybak a87a20cbb4 t7527: use test_when_finished in 'case insensitive+preserving'
Most tests in t7527-builtin-fsmonitor.sh that start a daemon, use the
helper function test_when_finished with stop_daemon_delete_repo.
Function stop_daemon_delete_repo explicitly stops the daemon.  Calling
it via test_when_finished is needed for tests that don't check daemon's
automatic shutdown logic [1] and it is needed to avoid daemons being
left running in case of breakage of the logic of automatic shutdown of
the daemon.

Unlike these tests, test 'case insensitive+preserving' added in [2] has
a call to function test_when_finished commented out.  It was commented
out in all versions of the patch [2] during development [3].  This seems
to not be intentional, because neither commit message in [2], nor the
comment above the test mention this line being commented out.  Compare
it, for example, to "# unicode_debug=true" which is explicitly described
by a documentation comment above it.

Uncomment test_when_finished for stop_daemon_delete_repo in test 'case
insensitive+preserving' to ensure that daemons are not left running in
cases when automatic shutdown logic of daemon itself is broken.

[1] See documentation in "fsmonitor--daemon.h" for details.
[2] caa9c37ec0 (t7527: test FSMonitor on case insensitive+preserving
    file system, 2022-05-26)
[3] See mailing list thread
    https://lore.kernel.org/git/41f8cbc2ae45cb86e299eb230ad3cb0319256c37.1653601644.git.gitgitgadget@gmail.com/T/#t

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13 12:06:10 -08:00
Andrei Rybak 5da4597297 t6422: drop commented out code
In commit [1] tests in t6422-merge-rename-corner-cases.sh were
refactored to not run setup steps separately.  This included replacing
all tests like

	test_expect_success "setup ..." '
		<code of setup>
	'

with corresponding Shell functions

	test_setup_... () {
		<code of setup>
	}

During this replacement first and last lines of one of such tests got
left commented out in code.  Drop these lines to avoid confusion.

[1] da1e295e00 (t604[236]: do not run setup in separate tests, 2019-10-22)

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13 12:05:47 -08:00
Andrei Rybak b3594800eb t6003: uncomment test '--max-age=c3, --topo-order'
Test '--max-age=c3, --topo-order' in t6003-rev-list-topo-order.sh has
been commented out as failing since its introduction in [1].  However,
the test is successful at least since commit [2] -- bisecting further is
harder because of incompatibility of such old Git code with modern
header file <openssl/bn.h> [3].

Uncomment this test to gain test coverage.

[1] f573571a21 ([PATCH] Add t/t6003 with some --topo-order tests,
    2005-07-07)
[2] 765ac8ec46 (Rip out merge-order and make "git log <paths>..." work
    again., 2006-02-28)
[3] BIGNUM used in git's `epoch.c` which was removed in [2] changed
    significantly between OpenSSL 1.0.2 and OpenSSL 1.1.0
    See also https://stackoverflow.com/a/42295243/1083697 and
    https://lore.kernel.org/git/Y71qiCs+oAS2OegH@coredump.intra.peff.net/

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13 12:05:41 -08:00
Đoàn Trần Công Danh b56be49984 date.c: allow ISO 8601 reduced precision times
ISO 8601 permits "reduced precision" time representations to omit the
seconds value or both the minutes and the seconds values.  The
abbreviate times could look like 17:45 or 1745 to omit the seconds,
or simply as 17 to omit both the minutes and the seconds.

parse_date_basic accepts the 17:45 format but it rejects the other two.
Change it to accept 4-digit and 2-digit time values when they follow a
recognized date and a 'T'.

Before this change:

$ TZ=UTC test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
2022-12-13T23:00 -> 2022-12-13 23:00:00 +0000
2022-12-13T2300 -> 2022-12-13 23:54:13 +0000
2022-12-13T23 -> 2022-12-13 23:54:13 +0000

After this change:

$ TZ=UTC helper/test-tool date approxidate 2022-12-13T23:00 2022-12-13T2300 2022-12-13T23
2022-12-13T23:00 -> 2022-12-13 23:00:00 +0000
2022-12-13T2300 -> 2022-12-13 23:00:00 +0000
2022-12-13T23 -> 2022-12-13 23:00:00 +0000

Note: ISO 8601 also allows reduced precision date strings such as
"2022-12" and "2022". This patch does not attempt to address these.

Reported-by: Pat LaVarre <plavarre@purestorage.com>
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13 11:49:04 -08:00
Jeff King fca2d86c97 t/interop: report which vanilla git command failed
The interop test library sets up wrappers "git.a" and "git.b" to
represent the two versions to be tested. It also wraps vanilla "git" to
report an error, with the goal of catching tests which accidentally fail
to use one of the version-specific wrappers (which could invalidate the
tests in a very subtle way).

But when it catches an invocation of vanilla git, it doesn't give any
details, which makes it very hard to debug exactly which invocation is
responsible (especially if it's buried in a function invocation, etc).
Let's report the arguments passed to git, which helps narrow it down.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-13 11:48:24 -08:00
Junio C Hamano bfc7ef3554 Merge branch 'js/drop-mingw-test-cmp'
Use `git diff --no-index` as a test_cmp on Windows.

We'd probably need to revisit "do we really want to, and have to,
lose CRLF vs LF?" later, at which time we may be able to further
clean this up by replacing "git diff --no-index" with "diff -u".

* js/drop-mingw-test-cmp:
  tests(mingw): avoid very slow `mingw_test_cmp`
2023-01-08 13:25:19 +09:00
Andrei Rybak b39a84185e *: fix typos which duplicate a word
Fix typos in code comments which repeat various words.  Most of the
cases are simple in that they repeat a word that usually cannot be
repeated in a grammatically correct sentence.  Just remove the
incorrectly duplicated word in these cases and rewrap text, if needed.

A tricky case is usage of "that that", which is sometimes grammatically
correct.  However, an instance of this in "t7527-builtin-fsmonitor.sh"
doesn't need two words "that", because there is only one daemon being
discussed, so replace the second "that" with "the".

Reword code comment "entries exist on on-disk index" in function
update_one in file cache-tree.c, by replacing incorrect preposition "on"
with "in".

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-08 10:28:34 +09:00
Derrick Stolee 17194b195d features: feature.manyFiles implies fast index writes
The recent addition of the index.skipHash config option allows index
writes to speed up by skipping the hash computation for the trailing
checksum. This is particularly critical for repositories with many files
at HEAD, so add this config option to two cases where users in that
scenario may opt-in to such behavior:

 1. The feature.manyFiles config option enables some options that are
    helpful for repositories with many files at HEAD.

 2. 'scalar register' and 'scalar reconfigure' set config options that
    optimize for large repositories.

In both of these cases, set index.skipHash=true to gain this
speedup. Add tests that demonstrate the proper way that
index.skipHash=true can override feature.manyFiles=true.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-07 07:46:14 +09:00
Derrick Stolee da9acde14e test-lib-functions: add helper for trailing hash
It can be helpful to check that a file format with a trailing hash has a
specific hash in the final bytes of a written file. This is made more
apparent by recent changes that allow skipping the hash algorithm and
writing a null hash at the end of the file instead.

Add a new test_trailing_hash helper and use it in t1600 to verify that
index.skipHash=true really does skip the hash computation, since
'git fsck' does not actually verify the hash. This confirms that when
the config is disabled explicitly in a super project but enabled in a
submodule, then the use of repo_config_get_bool() loads config from the
correct repository in the case of 'git add'. There are other cases where
istate->repo is NULL and thus this config is loaded instead from
the_repository, but that's due to many different code paths initializing
index_state structs in their own way.

Keep the 'git fsck' call to ensure that any potential future change to
check the index hash does not cause an error in this case.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-07 07:46:14 +09:00
Derrick Stolee ee1f0c242e read-cache: add index.skipHash config option
The previous change allowed skipping the hashing portion of the
hashwrite API, using it instead as a buffered write API. Disabling the
hashwrite can be particularly helpful when the write operation is in a
critical path.

One such critical path is the writing of the index. This operation is so
critical that the sparse index was created specifically to reduce the
size of the index to make these writes (and reads) faster.

This trade-off between file stability at rest and write-time performance
is not easy to balance. The index is an interesting case for a couple
reasons:

1. Writes block users. Writing the index takes place in many user-
   blocking foreground operations. The speed improvement directly
   impacts their use. Other file formats are typically written in the
   background (commit-graph, multi-pack-index) or are super-critical to
   correctness (pack-files).

2. Index files are short lived. It is rare that a user leaves an index
   for a long time with many staged changes. Outside of staged changes,
   the index can be completely destroyed and rewritten with minimal
   impact to the user.

Following a similar approach to one used in the microsoft/git fork [1],
add a new config option (index.skipHash) that allows disabling this
hashing during the index write. The cost is that we can no longer
validate the contents for corruption-at-rest using the trailing hash.

[1] 21fed2d914

We load this config from the repository config given by istate->repo,
with a fallback to the_repository if it is not set.

While older Git versions will not recognize the null hash as a special
case, the file format itself is still being met in terms of its
structure. Using this null hash will still allow Git operations to
function across older versions.

The one exception is 'git fsck' which checks the hash of the index file.
This used to be a check on every index read, but was split out to just
the index in a33fc72fe9 (read-cache: force_verify_index_checksum,
2017-04-14) and released first in Git 2.13.0. Document the versions that
relaxed these restrictions, with the optimistic expectation that this
change will be included in Git 2.40.0.

Here, we disable this check if the trailing hash is all zeroes. We add a
warning to the config option that this may cause undesirable behavior
with older Git versions.

As a quick comparison, I tested 'git update-index --force-write' with
and without index.skipHash=true on a copy of the Linux kernel
repository.

Benchmark 1: with hash
  Time (mean ± σ):      46.3 ms ±  13.8 ms    [User: 34.3 ms, System: 11.9 ms]
  Range (min … max):    34.3 ms …  79.1 ms    82 runs

Benchmark 2: without hash
  Time (mean ± σ):      26.0 ms ±   7.9 ms    [User: 11.8 ms, System: 14.2 ms]
  Range (min … max):    16.3 ms …  42.0 ms    69 runs

Summary
  'without hash' ran
    1.78 ± 0.76 times faster than 'with hash'

These performance benefits are substantial enough to allow users the
ability to opt-in to this feature, even with the potential confusion
with older 'git fsck' versions.

Test this new config option, both at a command-line level and within a
submodule. The confirmation is currently limited to confirm that 'git
fsck' does not complain about the index. Future updates will make this
test more robust.

It is critical that this test is placed before the test_index_version
tests, since those tests obliterate the .git/config file and hence lose
the setting from GIT_TEST_DEFAULT_HASH, if set.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-07 07:46:14 +09:00
Jeff King a0f83e7776 diff: use filespec path to set up tempfiles for ext-diff
When we're going to run an external diff, we have to make the contents
of the pre- and post-images available either by dumping them to a
tempfile, or by pointing at a valid file in the worktree. The logic of
this is all handled by prepare_temp_file(), and we just pass in the
filename and the diff_filespec.

But there's a gotcha here. The "filename" we have is a logical filename
and not necessarily a path on disk or in the repository. This matters in
at least one case: when using "--relative", we may have a name like
"foo", even though the file content is found at "subdir/foo". As a
result, we look for the wrong path, fail to find "foo", and claim that
the file has been deleted (passing "/dev/null" to the external diff,
rather than the correct worktree path).

We can fix this by passing the pathname from the diff_filespec, which
should always be a full repository path (and that's what we want even if
reusing a worktree file, since we're always operating from the top-level
of the working tree).

The breakage seems to go all the way back to cd676a5136 (diff
--relative: output paths as relative to the current subdirectory,
2008-02-12). As far as I can tell, before then "name" would always have
been the same as the filespec's "path".

There are two related cases I looked at that aren't buggy:

  1. the only other caller of prepare_temp_file() is run_textconv(). But
     it always passes the filespec's path field, so it's OK.

  2. I wondered if file renames/copies might cause similar confusion.
     But they don't, because run_external_diff() receives two names in
     that case: "name" and "other", which correspond to the two sides of
     the diff. And we did correctly pass "other" when handling the
     post-image side. Barring the use of "--relative", that would always
     match "two->path", the path of the second filespec (and the rename
     destination).

So the only bug is just the interaction with external diff drivers and
--relative.

Reported-by: Carl Baldwin <carl@ecbaldwin.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-06 21:49:55 +09:00
Jeff King d4e241a145 test-bundle-uri: drop unused variables
Commit 70b9c10373 (bundle-uri client: add helper for testing server,
2022-12-22) added a cmd_ls_remote() function which contains "uploadpack"
and "server_options" variables. Neither of these variables is ever
modified after being initialized, so the code to handle non-NULL and
non-empty values is impossible to reach.

While in theory we might add command-line parsing to set these, let's
drop the dead code for now in the name of cleanliness. It's easy enough
to add it back later if need be.

Noticed by Coverity.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-06 21:34:49 +09:00
Junio C Hamano d4c5400865 Merge branch 'ab/no-more-git-global-super-prefix'
Stop using "git --super-prefix" and narrow the scope of its use to
the submodule--helper.

* ab/no-more-git-global-super-prefix:
  read-tree: add "--super-prefix" option, eliminate global
  submodule--helper: convert "{update,clone}" to their own "--super-prefix"
  submodule--helper: convert "status" to its own "--super-prefix"
  submodule--helper: convert "sync" to its own "--super-prefix"
  submodule--helper: convert "foreach" to its own "--super-prefix"
  submodule--helper: don't use global --super-prefix in "absorbgitdirs"
  submodule.c & submodule--helper: pass along "super_prefix" param
  read-tree + fetch tests: test failing "--super-prefix" interaction
  submodule absorbgitdirs tests: add missing "Migrating git..." tests
2023-01-05 15:07:23 +09:00
Junio C Hamano bc58ebf84e Merge branch 'ab/bundle-wo-args'
Fix to a small regression in 2.38 days.

* ab/bundle-wo-args:
  bundle <cmd>: have usage_msg_opt() note the missing "<file>"
  builtin/bundle.c: remove superfluous "newargc" variable
  bundle: don't segfault on "git bundle <subcmd>"
2023-01-05 15:07:22 +09:00
Junio C Hamano 6f212b7c3f Merge branch 'sg/test-oid-wo-incomplete-line'
Test helper updates.

* sg/test-oid-wo-incomplete-line:
  tests: make 'test_oid' print trailing newline
2023-01-05 15:07:19 +09:00
Junio C Hamano 319c3abadb Merge branch 'sa/cat-file-mailmap--batch-check'
'cat-file' gains mailmap support for its '--batch-check' and '-s'
options.

* sa/cat-file-mailmap--batch-check:
  cat-file: add mailmap support to --batch-check option
  cat-file: add mailmap support to -s option
2023-01-05 15:07:17 +09:00
Thierry Reding 566902f2db am: allow passing --no-verify flag
The git-am --no-verify flag is analogous to the same flag passed to
git-commit. It bypasses the pre-applypatch and applypatch-msg hooks
if they are enabled.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-05 14:52:25 +09:00
William Sprent 5842710dc2 dir: check for single file cone patterns
The sparse checkout documentation states that the cone mode pattern set
is limited to patterns that either recursively include directories or
patterns that match all files in a directory. In the sparse checkout
file, the former manifest in the form:

    /A/B/C/

while the latter become a pair of patterns either in the form:

    /A/B/
    !/A/B/*/

or in the special case of matching the toplevel files:

    /*
    !/*/

The 'add_pattern_to_hashsets()' function contains checks which serve to
disable cone-mode when non-cone patterns are encountered. However, these
do not catch when the pattern list attempts to match a single file or
directory, e.g. a pattern in the form:

    /A/B/C

This causes sparse-checkout to exhibit unexpected behaviour when such a
pattern is in the sparse-checkout file and cone mode is enabled.
Concretely, with the pattern like the above, sparse-checkout, in
non-cone mode, will only include the directory or file located at
'/A/B/C'. However, with cone mode enabled, sparse-checkout will instead
just manifest the toplevel files but not any file located at '/A/B/C'.

Relatedly, issues occur when supplying the same kind of filter when
partial cloning with '--filter=sparse:oid=<oid>'. 'upload-pack' will
correctly just include the objects that match the non-cone pattern
matching. Which means that checking out the newly cloned repo with the
same filter, but with cone mode enabled, fails due to missing objects.

To fix these issues, add a cone mode pattern check that asserts that
every pattern is either a directory match or the pattern '/*'. Add a
test to verify the new pattern check and modify another to reflect that
non-directory patterns are caught earlier.

Signed-off-by: William Sprent <williams@unity3d.com>
Acked-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-05 11:14:28 +09:00
Junio C Hamano e83d57e34a Merge branch 'ew/format-patch-mboxrd'
"git format-patch" learned to honor format.mboxrd even when sending
patches to the standard output stream,

* ew/format-patch-mboxrd:
  format-patch: support format.mboxrd with --stdout
2023-01-02 21:37:19 +09:00
Junio C Hamano 0903d8bbde Merge branch 'ds/bundle-uri-4'
Bundle URIs part 4.

* ds/bundle-uri-4:
  clone: unbundle the advertised bundles
  bundle-uri: download bundles from an advertised list
  bundle-uri: allow relative URLs in bundle lists
  strbuf: introduce strbuf_strip_file_from_path()
  bundle-uri: serve bundle.* keys from config
  bundle-uri client: add helper for testing server
  transport: rename got_remote_heads
  bundle-uri client: add boolean transfer.bundleURI setting
  clone: request the 'bundle-uri' command when available
  t: create test harness for 'bundle-uri' command
  protocol v2: add server-side "bundle-uri" skeleton
2023-01-02 21:37:18 +09:00
Junio C Hamano 3f2e4c09c7 Merge branch 'lk/line-range-parsing-fix'
When given a pattern that matches an empty string at the end of a
line, the code to parse the "git diff" line-ranges fell into an
infinite loop, which has been corrected.

* lk/line-range-parsing-fix:
  line-range: fix infinite loop bug with '$' regex
2023-01-02 21:37:18 +09:00
Junio C Hamano 48475f43a0 Merge branch 'sa/git-var-sequence-editor'
Just like "git var GIT_EDITOR" abstracts the complex logic to
choose which editor gets used behind it, "git var" now give support
to GIT_SEQUENCE_EDITOR.

* sa/git-var-sequence-editor:
  var: add GIT_SEQUENCE_EDITOR variable
2022-12-28 12:06:17 +09:00
Junio C Hamano e57caee004 Merge branch 'pg/diff-stat-unmerged-regression-fix'
The output from "git diff --stat" on an unmerged path lost the
terminating LF in Git 2.39, which has been corrected.

* pg/diff-stat-unmerged-regression-fix:
  diff: fix regression with --stat and unmerged file
2022-12-26 11:42:07 +09:00
Junio C Hamano 78d15022e7 Merge branch 'jk/ref-filter-error-reporting-fix'
Clean-ups in error messages produced by "git for-each-ref" and friends.

* jk/ref-filter-error-reporting-fix:
  ref-filter: convert email atom parser to use err_bad_arg()
  ref-filter: truncate atom names in error messages
  ref-filter: factor out "unrecognized %(foo) arg" errors
  ref-filter: factor out "%(foo) does not take arguments" errors
  ref-filter: reject arguments to %(HEAD)
2022-12-26 11:42:06 +09:00
Junio C Hamano 4a9b839dd1 Merge branch 'sg/help-autocorrect-config-fix'
The code to auto-correct a misspelt subcommand unnecessarily called
into git_default_config() from the early config codepath, which was
a no-no.  This has bee corrected.

* sg/help-autocorrect-config-fix:
  help.c: fix autocorrect in work tree for bare repository
2022-12-26 11:42:04 +09:00
Ævar Arnfjörð Bjarmason 4002ec3dcf read-tree: add "--super-prefix" option, eliminate global
The "--super-prefix" option to "git" was initially added in [1] for
use with "ls-files"[2], and shortly thereafter "submodule--helper"[3]
and "grep"[4]. It wasn't until [5] that "read-tree" made use of it.

At the time [5] made sense, but since then we've made "ls-files"
recurse in-process in [6], "grep" in [7], and finally
"submodule--helper" in the preceding commits.

Let's also remove it from "read-tree", which allows us to remove the
option to "git" itself.

We can do this because the only remaining user of it is the submodule
API, which will now invoke "read-tree" with its new "--super-prefix"
option. It will only do so when the "submodule_move_head()" function
is called.

That "submodule_move_head()" function was then only invoked by
"read-tree" itself, but now rather than setting an environment
variable to pass "--super-prefix" between cmd_read_tree() we:

- Set a new "super_prefix" in "struct unpack_trees_options". The
  "super_prefixed()" function in "unpack-trees.c" added in [5] will now
  use this, rather than get_super_prefix() looking up the environment
  variable we set earlier in the same process.

- Add the same field to the "struct checkout", which is only needed to
  ferry the "super_prefix" in the "struct unpack_trees_options" all the
  way down to the "entry.c" callers of "submodule_move_head()".

  Those calls which used the super prefix all originated in
  "cmd_read_tree()". The only other caller is the "unlink_entry()"
  caller in "builtin/checkout.c", which now passes a "NULL".

1. 74866d7579 (git: make super-prefix option, 2016-10-07)
2. e77aa336f1 (ls-files: optionally recurse into submodules, 2016-10-07)
3. 89c8626557 (submodule helper: support super prefix, 2016-12-08)
4. 0281e487fd (grep: optionally recurse into submodules, 2016-12-16)
5. 3d415425c7 (unpack-trees: support super-prefix option, 2017-01-17)
6. 188dce131f (ls-files: use repository object, 2017-06-22)
7. f9ee2fcdfa (grep: recurse in-process using 'struct repository', 2017-08-02)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-26 10:21:44 +09:00
Ævar Arnfjörð Bjarmason bb61a962d2 submodule--helper: don't use global --super-prefix in "absorbgitdirs"
The "--super-prefix" facility was introduced in [1] has always been a
transitory hack, which is why we've made it an error to supply it as
an option to "git" to commands that don't know about it.

That's been a good goal, as it has a global effect we haven't wanted
calls to get_super_prefix() from built-ins we didn't expect.

But it has meant that when we've had chains of different built-ins
using it all of the processes in that "chain" have needed to support
it, and worse processes that don't need it have needed to ask for
"SUPPORT_SUPER_PREFIX" because their parent process needs it.

That's how "fsmonitor--daemon" ended up with it, per [2] it's called
from (among other things) "submodule--helper absorbgitdirs", but as we
declared "submodule--helper" as "SUPPORT_SUPER_PREFIX" we needed to
declare "fsmonitor--daemon" as accepting it too, even though it
doesn't care about it.

But in the case of "absorbgitdirs" it only needed "--super-prefix" to
invoke itself recursively, and we'd never have another "in-between"
process in the chain. So we didn't need the bigger hammer of "git
--super-prefix", and the "setenv(GIT_SUPER_PREFIX_ENVIRONMENT, ...)"
that it entails.

Let's instead accept a hidden "--super-prefix" option to
"submodule--helper absorbgitdirs" itself.

Eventually (as with all other "--super-prefix" users) we'll want to
clean this code up so that this all happens in-process. I.e. needing
any variant of "--super-prefix" is itself a hack around our various
global state, and implicit reliance on "the_repository". This stepping
stone makes such an eventual change easier, as we'll need to deal with
less global state at that point.

The "fsmonitor--daemon" test adjusted here was added in [3]. To assert
that it didn't run into the "--super-prefix" message it was asserting
the output it didn't have. Let's instead assert the full output that
we *do* have, using the same pattern as a preceding change to
"t/t7412-submodule-absorbgitdirs.sh" used.

We could also remove the test entirely (as [4] did), but even though
the initial reason for having it is gone we're still getting some
marginal benefit from testing the "fsmonitor" and "submodule
absorbgitdirs" interaction, so let's keep it.

The change here to have either a NULL or non-"" string as a
"super_prefix" instead of the previous arrangement of "" or non-"" is
somewhat arbitrary. We could also decide to never have to check for
NULL.

As we'll be changing the rest of the "git --super-prefix" users to the
same pattern, leaving them all consistent makes sense. Why not pick ""
over NULL? Because that's how the "prefix" works[5], and having
"prefix" and "super_prefix" work the same way will be less
confusing. That "prefix" picked NULL instead of "" is itself
arbitrary, but as it's easy to make this small bit of our overall API
consistent, let's go with that.

1. 74866d7579 (git: make super-prefix option, 2016-10-07)
2. 53fcfbc84f (fsmonitor--daemon: allow --super-prefix argument,
   2022-05-26)
3. 53fcfbc84f (fsmonitor--daemon: allow --super-prefix argument,
   2022-05-26)
4. https://lore.kernel.org/git/20221109004708.97668-5-chooglen@google.com/
5. 9725c8dda2 (built-ins: trust the "prefix" from run_builtin(),
   2022-02-16)

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-26 10:21:43 +09:00
Glen Choo 0d1806e53d read-tree + fetch tests: test failing "--super-prefix" interaction
Ever since "git fetch --refetch" was introduced in 0f5e885173 (Merge
branch 'rc/fetch-refetch', 2022-04-04) the test being added here would
fail. This is because "restore" will "read-tree .. --reset <hash>",
which will in turn invoke "fetch". The "fetch" will then die with:

	fatal: fetch doesn't support --super-prefix

This edge case and other "--super-prefix" bugs will be fixed in
subsequent commits, but let's first add a "test_expect_failure" test
for it. It passes until the very last command in the test.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-26 10:21:43 +09:00
Ævar Arnfjörð Bjarmason 49eb1d388a submodule absorbgitdirs tests: add missing "Migrating git..." tests
Fix a blind spots in the tests surrounding "submodule absorbgitdirs"
and test what output we emit, and how emitted the message and behavior
interacts with a "git worktree" where the repository isn't at the base
of the working directory.

The "$(pwd)" instead of "$PWD" here is needed due to Windows, where
the latter will be a path like "/d/a/git/[...]", whereas we need
"D:/a/git/[...]".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-26 10:21:43 +09:00
Eric Wong 4810946f60 format-patch: support format.mboxrd with --stdout
mboxrd is a more robust output format when used with --stdout
and needs more exposure.  Introducing this config knob lets
users choose the more robust format for all their --stdout
uses.

Relying on --pretty=mboxrd and including all of pretty-formats.txt
in the `git format-patch' documentation would likely be
confusing to users.  Furthermore, this setting is useful across
multiple invocations.  So introduce `format.mboxrd' as a boolean
configuration knob that changes the default --pretty=email format
to --pretty=mboxrd when (and only when) --stdout is in use.

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:32:45 +09:00
Derrick Stolee 876094ac16 clone: unbundle the advertised bundles
A previous change introduced the transport methods to acquire a bundle
list from the 'bundle-uri' protocol v2 command, when advertised _and_
when the client has chosen to enable the feature.

Teach Git to download and unbundle the data advertised by those bundles
during 'git clone'. This takes place between the ref advertisement and
the object data download, and stateful connections will linger while
the client downloads bundles. In the future, we should consider closing
the remote connection during this process.

Also, since the --bundle-uri option exists, we do not want to mix the
advertised bundles with the user-specified bundles.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:24:24 +09:00
Derrick Stolee ebc3947955 bundle-uri: allow relative URLs in bundle lists
Bundle providers may want to distribute that data across multiple CDNs.
This might require a change in the base URI, all the way to the domain
name. If all bundles require an absolute URI in their 'uri' value, then
every push to a CDN would require altering the table of contents to
match the expected domain and exact location within it.

Allow a bundle list to specify a relative URI for the bundles. This URI
is based on where the client received the bundle list. For a list
provided in the 'bundle-uri' protocol v2 command, the Git remote URI is
the base URI. Otherwise, the bundle list was provided from an HTTP URI
not using the Git protocol, and that URI is the base URI. This allows
easier distribution of bundle data.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:24:24 +09:00
Derrick Stolee 738dc7d4a5 bundle-uri: serve bundle.* keys from config
Implement the "bundle-uri" protocol v2 capability by populating the
key=value packet lines from the local Git config. The list of bundles is
provided from the keys beginning with "bundle.".

In the future, we may want to filter this list to be more specific to
the exact known keys that the server intends to share, but for
flexibility at the moment we will assume that the config values are
well-formed.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:24:24 +09:00
Ævar Arnfjörð Bjarmason 70b9c10373 bundle-uri client: add helper for testing server
Add a 'test-tool bundle-uri ls-remote' command. This is a thin wrapper
for issuing protocol v2 "bundle-uri" commands to a server, and to the
parsing routines in bundle-uri.c.

In the "git clone" case we'll have already done the handshake(),
but not here. Add an extra case to check for this handshake in
get_bundle_uri() for ease of use for future callers.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:24:24 +09:00
Ævar Arnfjörð Bjarmason 7cce9074a7 bundle-uri client: add boolean transfer.bundleURI setting
The yet-to-be introduced client support for bundle-uri will always
fall back on a full clone, but we'd still like to be able to ignore a
server's bundle-uri advertisement entirely.

The new transfer.bundleURI config option defaults to 'false', but a user
can set it to 'true' to enable checking for bundle URIs from the origin
Git server using protocol v2.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:24:23 +09:00
Ævar Arnfjörð Bjarmason 0cfde740f0 clone: request the 'bundle-uri' command when available
Set up all the needed client parts of the 'bundle-uri' protocol v2
command, without actually doing anything with the bundle URIs.

If the server says it supports 'bundle-uri' teach Git to issue the
'bundle-uri' command after the 'ls-refs' during 'git clone'. The
returned key=value pairs are passed to the bundle list code which is
tested using a different ingest mechanism in t5750-bundle-uri-parse.sh.

At this point, Git does nothing with that bundle list. It will not
download any of the bundles. That will come in a later change after
these protocol bits are finalized.

The no-op client is initially used only by 'git clone' to test the basic
functionality, and eventually will bootstrap the initial download of Git
objects during a fresh clone. The bundle URI client will not be
integrated into other fetches until a mechanism is created to select a
subset of bundles for download.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:24:23 +09:00
Ævar Arnfjörð Bjarmason 8f788eb8b7 t: create test harness for 'bundle-uri' command
The previous change allowed for a Git server to advertise the
'bundle-uri' command as a capability based on the
uploadPack.advertiseBundleURIs config option. Create a set of tests that
check that this capability is advertised using 'git ls-remote'.

In order to test this functionality across three protocols (file, git,
and http), create lib-bundle-uri-protocol.sh to generalize the tests,
allowing the other test scripts to set an environment variable and
otherwise inherit the setup and tests from this script.

The tests currently only test that the 'bundle-uri' command is
advertised or not. Other actions will be tested as the Git client learns
to request the 'bundle-uri' command and parse its response.

To help with URI escaping, specifically for file paths with a space in
them, extract a 'sed' invocation from t9199-git-svn-info.sh into a
helper function for use here, too.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:24:23 +09:00
Ævar Arnfjörð Bjarmason 8b8d9a2298 protocol v2: add server-side "bundle-uri" skeleton
Add a skeleton server-side implementation of a new "bundle-uri" command
to protocol v2. This will allow conforming clients to optionally seed
their initial clones or incremental fetches from URLs containing
"*.bundle" files created with "git bundle create".

This change only performs the basic boilerplate of advertising a new
protocol v2 capability. The new 'bundle-uri' capability allows a client
to request a list of bundles. Right now, the server only returns a flush
packet, which corresponds to an empty advertisement. The bundle.* config
namespace describes which key-value pairs will be communicated across
this interface in future updates.

The critical bit right now is that the new boolean
uploadPack.adverstiseBundleURIs config value signals whether or not this
capability should be advertised at all.

An earlier version of this patch [1] used a different transfer format
than the "key=value" pairs in the current implementation. The change was
made to unify the protocol v2 command with the bundle lists provided by
independent bundle servers. Further, the standard allows for the server
to advertise a URI that contains a bundle list. This allows users
automatically discovering bundle providers that are loosely associated
with the origin server, but without the origin server knowing exactly
which bundles are currently available.

[1] https://lore.kernel.org/git/RFC-patch-v2-01.13-2fc87ce092b-20220311T155841Z-avarab@gmail.com/

The very-deep headings needed to be modified to stop at level 4 due to
documentation build issues. These were not recognized in earlier builds
since the file was previously in the Documentation/technical/ directory
and was built in a different way. With its current location, the
heavily-nested details were causing build issues and they are now
replaced with a bulletted list of details.

Co-authored-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:24:23 +09:00
Ævar Arnfjörð Bjarmason 891cb09db6 bundle: don't segfault on "git bundle <subcmd>"
Since aef7d75e58 (builtin/bundle.c: let parse-options parse
subcommands, 2022-08-19) we've been segfaulting if no argument was
provided.

The fix is easy, as all of the "git bundle" subcommands require a
non-option argument we can check that we have arguments left after
calling parse-options().

This makes use of code added in 73c3253d75 (bundle: framework for
options before bundle file, 2019-11-10), before this change that code
has always been unreachable. In 73c3253d75 we'd never reach it as we
already checked "argc < 2" in cmd_bundle() itself.

Then when aef7d75e58 (whose segfault we're fixing here) migrated this
code to the subcommand API it removed that "argc < 2" check, but we
were still checking the wrong "argc" in parse_options_cmd_bundle(), we
need to check the "newargc". The "argc" will always be >= 1, as it
will necessarily contain at least the subcommand name
itself (e.g. "create").

As an aside, this could be safely squashed into this, but let's not do
that for this minimal segfault fix, as it's an unrelated refactoring:

	--- a/builtin/bundle.c
	+++ b/builtin/bundle.c
	@@ -55,13 +55,12 @@ static int parse_options_cmd_bundle(int argc,
	 		const char * const usagestr[],
	 		const struct option options[],
	 		char **bundle_file) {
	-	int newargc;
	-	newargc = parse_options(argc, argv, NULL, options, usagestr,
	+	argc = parse_options(argc, argv, NULL, options, usagestr,
	 			     PARSE_OPT_STOP_AT_NON_OPTION);
	-	if (!newargc)
	+	if (!argc)
	 		usage_with_options(usagestr, options);
	 	*bundle_file = prefix_filename(prefix, argv[0]);
	-	return newargc;
	+	return argc;
	 }

	 static int cmd_bundle_create(int argc, const char **argv, const char *prefix) {

Reported-by: Hubert Jasudowicz <hubertj@stmcyber.pl>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Tested-by: Hubert Jasudowicz <hubertj@stmcyber.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-25 16:01:09 +09:00
Siddharth Asthana a797c0ea04 cat-file: add mailmap support to --batch-check option
Even though the cat-file command with `--batch-check` option does not
complain when `--use-mailmap` option is given, the latter option is
ignored. Compute the size of the object after replacing the idents and
report it instead.

In order to make `--batch-check` option honour the mailmap mechanism we
have to read the contents of the commit/tag object.

There were two ways to do it:

1. Make two calls to `oid_object_info_extended()`. If `--use-mailmap`
   option is given, the first call will get us the type of the object
   and second call will only be made if the object type is either a
   commit or tag to get the contents of the object.

2. Make one call to `oid_object_info_extended()` to get the type of the
   object. Then, if the object type is either of commit or tag, make a
   call to `repo_read_object_file()` to read the contents of the object.

I benchmarked the following command with both the above approaches and
compared against the current implementation where `--use-mailmap`
option is ignored:

`git cat-file --use-mailmap --batch-all-objects --batch-check --buffer
--unordered`

The results can be summarized as follows:
                       Time (mean ± σ)
default               827.7 ms ± 104.8 ms
first approach        6.197 s ± 0.093 s
second approach       1.975 s ± 0.217 s

Since, the second approach is faster than the first one, I implemented
it in this patch.

The command git cat-file can now use the mailmap mechanism to replace
idents with canonical versions for commit and tag objects. There are
several options like `--batch`, `--batch-check` and `--batch-command`
that can be combined with `--use-mailmap`. But the documentation for
`--batch`, `--batch-check` and `--batch-command` doesn't say so. This
patch fixes that documentation.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-20 15:20:45 +09:00
Siddharth Asthana 49050a043b cat-file: add mailmap support to -s option
Even though the cat-file command with `-s` option does not complain when
`--use-mailmap` option is given, the latter option is ignored. Compute
the size of the object after replacing the idents and report it instead.

In order to make `-s` option honour the mailmap mechanism we have to
read the contents of the commit/tag object. Make use of the call to
`oid_object_info_extended()` to get the contents of the object and store
in `buf`. `buf` is later freed in the function.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-20 15:20:45 +09:00
Lars Kellogg-Stedman 4e57c88e02 line-range: fix infinite loop bug with '$' regex
When the -L argument to "git log" is passed the zero-width regular
expression "$" (as in "-L :$:line-range.c"), this results in an
infinite loop in find_funcname_matching_regexp().

Modify find_funcname_matching_regexp to correctly match the entire line
instead of the zero-width match at eol and update the loop condition to
prevent an infinite loop in the event of other undiscovered corner cases.

The primary change is that we pre-decrement the beginning-of-line marker
('bol') before comparing it to '\n'. In the case of '$', where we match the
'\n' at the end of the line and start the loop with bol == eol, this
ensures that bol will find the beginning of the line on which the match
occurred.

Originally reported in <https://stackoverflow.com/q/74690545/147356>.

Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-20 10:00:43 +09:00
Junio C Hamano 963f8d3b63 Merge branch 'rj/branch-copy-and-rename'
Fix a pair of bugs in 'git branch'.

* rj/branch-copy-and-rename:
  branch: force-copy a branch to itself via @{-1} is a no-op
2022-12-19 11:46:18 +09:00
Junio C Hamano f3d9bc801a Merge branch 'rr/status-untracked-advice'
The advice message given by "git status" when it takes long time to
enumerate untracked paths has been updated.

* rr/status-untracked-advice:
  status: modernize git-status "slow untracked files" advice
2022-12-19 11:46:18 +09:00
Junio C Hamano 053650ddad Merge branch 'aw/complete-case-insensitive'
Introduce a case insensitive mode to the Bash completion helpers.

* aw/complete-case-insensitive:
  completion: add case-insensitive match of pseudorefs
  completion: add optional ignore-case when matching refs
2022-12-19 11:46:18 +09:00
Junio C Hamano 3c0a988672 Merge branch 'rs/t3920-crlf-eating-grep-fix'
Test fix.

* rs/t3920-crlf-eating-grep-fix:
  t3920: support CR-eating grep
2022-12-19 11:46:14 +09:00
Junio C Hamano b7bb8828cf Merge branch 'js/t3920-shell-and-or-fix'
Test fix.

* js/t3920-shell-and-or-fix:
  t3920: don't ignore errors of more than one command with `|| true`
2022-12-19 11:46:14 +09:00
Junio C Hamano 314a0af909 Merge branch 'ab/t4023-avoid-losing-exit-status-of-diff'
Test fix.

* ab/t4023-avoid-losing-exit-status-of-diff:
  t4023: fix ignored exit codes of git
2022-12-19 11:46:13 +09:00
Junio C Hamano 4eec47c1cd Merge branch 'ab/t7600-avoid-losing-exit-status-of-git'
Test fix.

* ab/t7600-avoid-losing-exit-status-of-git:
  t7600: don't ignore "rev-parse" exit code in helper
2022-12-19 11:46:13 +09:00
Junio C Hamano d2caf09d00 Merge branch 'ab/t5314-avoid-losing-exit-status'
Test fix.

* ab/t5314-avoid-losing-exit-status:
  t5314: check exit code of "git"
2022-12-19 11:46:13 +09:00
Junio C Hamano 907951c88b Merge branch 'rs/t4205-do-not-exit-in-test-script'
Test fix.

* rs/t4205-do-not-exit-in-test-script:
  t4205: don't exit test script on failure
2022-12-19 11:46:12 +09:00
SZEDER Gábor a48a88019b tests: make 'test_oid' print trailing newline
Unlike other test helper functions, 'test_oid' doesn't terminate its
output with a LF, but, alas, the reason for this, if any, is not
mentioned in 2c02b110da (t: add test functions to translate
hash-related values, 2018-09-13)).

Now, in the vast majority of cases 'test_oid' is invoked in a command
substitution that is part of a heredoc or supplies an argument to a
command or the value to a variable, and the command substitution would
chop off any trailing LFs, so in these cases the lack or presence of a
trailing LF in its output doesn't matter.  However:

  - There appear to be only three cases where 'test_oid' is not
    invoked in a command substitution:

      $ git grep '\stest_oid ' -- ':/t/*.sh'
      t0000-basic.sh:  test_oid zero >actual &&
      t0000-basic.sh:  test_oid zero >actual &&
      t0000-basic.sh:  test_oid zero >actual &&

    These are all in test cases checking that 'test_oid' actually
    works, and that the size of its output matches the size of the
    corresponding hash function with conditions like

      test $(wc -c <actual) -eq 40

    In these cases the lack of trailing LF does actually matter,
    though they could be trivially updated to account for the presence
    of a trailing LF.

  - There are also a few cases where the lack of trailing LF in
    'test_oid's output actually hurts, because tests need to compare
    its output with LF terminated file contents, forcing developers to
    invoke it as 'echo $(test_oid ...)' to append the missing LF:

      $ git grep 'echo "\?$(test_oid ' -- ':/t/*.sh'
      t1302-repo-version.sh:  echo $(test_oid version) >expect &&
      t1500-rev-parse.sh:     echo "$(test_oid algo)" >expect &&
      t4044-diff-index-unique-abbrev.sh:      echo "$(test_oid val1)" > foo &&
      t4044-diff-index-unique-abbrev.sh:      echo "$(test_oid val2)" > foo &&
      t5313-pack-bounds-checks.sh:    echo $(test_oid oidfff) >file &&

    And there is yet another similar case in an in-flight topic at:

      https://public-inbox.org/git/813e81a058227bd373cec802e443fcd677042fb4.1670862677.git.gitgitgadget@gmail.com/

Arguably we would be better off if 'test_oid' terminated its output
with a LF.  So let's update 'test_oid' accordingly, update its tests
in t0000 to account for the extra character in those size tests, and
remove the now unnecessary 'echo $(...)' command substitutions around
'test_oid' invocations as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-19 09:49:11 +09:00
Sean Allred 4c3dd9304e var: add GIT_SEQUENCE_EDITOR variable
The editor program used by Git when editing the sequencer "todo" file
is determined by examining a few environment variables and also
affected by configuration variables. Introduce "git var
GIT_SEQUENCE_EDITOR" to give users access to the final result of the
logic without having to know the exact details.

This is very similar in spirit to 44fcb497 (Teach git var about
GIT_EDITOR, 2009-11-11) that introduced "git var GIT_EDITOR".

Signed-off-by: Sean Allred <allred.sean@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-18 11:48:26 +09:00
Jeff King 1955ef10ed ref-filter: truncate atom names in error messages
If you pass a bogus argument to %(refname), you may end up with a
message like this:

  $ git for-each-ref --format='%(refname:foo)'
  fatal: unrecognized %(refname:foo) argument: foo

which is confusing. It should just say:

  fatal: unrecognized %(refname) argument: foo

which is clearer, and is consistent with most other atom parsers. Those
other parsers do not have the same problem because they pass the atom
name from a string literal in the parser function. But because the
parser for %(refname) also handles %(upstream) and %(push), it instead
uses atom->name, which includes the arguments. The oid atom parser which
handles %(tree), %(parent), etc suffers from the same problem.

It seems like the cleanest fix would be for atom->name to be _just_ the
name, since there's already a separate "args" field. But since that
field is also used for other things, we can't change it easily (e.g.,
it's how we find things in the used_atoms array, and clearly %(refname)
and %(refname:short) are not the same thing).

Instead, we'll teach our error_bad_arg() function to stop at the first
":". This is a little hacky, as we're effectively re-parsing the name,
but the format is simple enough to do this as a one-liner, and this
localizes the change to the error-reporting code.

We'll give the same treatment to err_no_arg(). None of its callers use
this atom->name trick, but it's worth future-proofing it while we're
here.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15 09:14:04 +09:00
Jeff King dda4fc1a84 ref-filter: factor out "unrecognized %(foo) arg" errors
Atom parsers that take arguments generally have a catch-all for "this
arg is not recognized". Most of them use the same printf template, which
is good, because it makes life easier for translators. Let's pull this
template into a helper function, which makes the code in the parsers
shorter and avoids any possibility of differences.

As with the previous commit, we'll pick an arbitrary atom to make sure
the test suite covers this code.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15 09:14:00 +09:00
Jeff King a33d0fae76 ref-filter: factor out "%(foo) does not take arguments" errors
Many atom parsers give the same error message, differing only in the
name of the atom. If we use "%s does not take arguments", that should
make life easier for translators, as they only need to translate one
string. And in doing so, we can easily pull it into a helper function to
make sure they are all using the exact same string.

I've added a basic test here for %(HEAD), just to make sure this code is
exercised at all in the test suite. We could cover each such atom, but
the effort-to-reward ratio of trying to maintain an exhaustive list
doesn't seem worth it.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15 09:13:56 +09:00
Peter Grayson 209d9cb011 diff: fix regression with --stat and unmerged file
A regression was introduced in

  12fc4ad89e (diff.c: use utf8_strwidth() to count display width, 2022-09-14)

that causes missing newlines after "Unmerged" entries in `git diff
--cached --stat` output.

This problem affects v2.39.0-rc0 through v2.39.0.

Add the missing newline along with a new test to cover this
behavior.

Signed-off-by: Peter Grayson <pete@jpgrayson.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15 09:12:04 +09:00
Junio C Hamano 26f81233ab Merge branch 'js/t0021-windows-pwd'
Test fix.

* js/t0021-windows-pwd:
  t0021: use Windows-friendly `pwd`
2022-12-14 17:42:18 +09:00
Junio C Hamano d818458088 Merge branch 'sa/git-var-empty'
"git var UNKNOWN_VARIABLE" and "git var VARIABLE" with the variable
given an empty value used to behave identically.  Now the latter
just gives an empty output, while the former still gives an error
message.

* sa/git-var-empty:
  var: allow GIT_EDITOR to return null
  var: do not print usage() with a correct invocation
2022-12-14 15:55:47 +09:00
Junio C Hamano cb3d2e535a Merge branch 'rs/multi-filter-args'
Fix a bug where `pack-objects` would not respect multiple `--filter`
arguments when invoked directly.

* rs/multi-filter-args:
  list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT()
  pack-objects: simplify --filter handling
  pack-objects: fix handling of multiple --filter options
  t5317: demonstrate failure to handle multiple --filter options
  t5317: stop losing return codes of git ls-files
2022-12-14 15:55:47 +09:00
Junio C Hamano a1b8e5ec28 Merge branch 'tl/pack-bitmap-absolute-paths'
The pack-bitmap machinery is taught to log the paths of redundant
bitmap(s) to trace2 instead of stderr.

* tl/pack-bitmap-absolute-paths:
  pack-bitmap.c: trace bitmap ignore logs when midx-bitmap is found
  pack-bitmap.c: break out of the bitmap loop early if not tracing
  pack-bitmap.c: avoid exposing absolute paths
  pack-bitmap.c: remove unnecessary "open_pack_index()" calls
2022-12-14 15:55:46 +09:00
Junio C Hamano 9ea1378d04 Merge branch 'ab/various-leak-fixes'
Various leak fixes.

* ab/various-leak-fixes:
  built-ins: use free() not UNLEAK() if trivial, rm dead code
  revert: fix parse_options_concat() leak
  cherry-pick: free "struct replay_opts" members
  rebase: don't leak on "--abort"
  connected.c: free the "struct packed_git"
  sequencer.c: fix "opts->strategy" leak in read_strategy_opts()
  ls-files: fix a --with-tree memory leak
  revision API: call graph_clear() in release_revisions()
  unpack-file: fix ancient leak in create_temp_file()
  built-ins & libs & helpers: add/move destructors, fix leaks
  dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache"
  read-cache.c: clear and free "sparse_checkout_patterns"
  commit: discard partial cache before (re-)reading it
  {reset,merge}: call discard_index() before returning
  tests: mark tests as passing with SANITIZE=leak
2022-12-14 15:55:46 +09:00
Junio C Hamano 7576e512ce Merge branch 'kz/merge-tree-merge-base'
"merge-tree" learns a new `--merge-base` option.

* kz/merge-tree-merge-base:
  docs: fix description of the `--merge-base` option
  merge-tree.c: allow specifying the merge-base when --stdin is passed
  merge-tree.c: add --merge-base=<commit> option
2022-12-14 15:55:46 +09:00
Junio C Hamano bee6e7a8f9 Merge branch 'dd/git-bisect-builtin'
`git bisect` becomes a builtin.

* dd/git-bisect-builtin:
  bisect; remove unused "git-bisect.sh" and ".gitignore" entry
  Turn `git bisect` into a full built-in
  bisect--helper: log: allow arbitrary number of arguments
  bisect--helper: handle states directly
  bisect--helper: emit usage for "git bisect"
  bisect test: test exit codes on bad usage
  bisect--helper: identify as bisect when report error
  bisect-run: verify_good: account for non-negative exit status
  bisect run: keep some of the post-v2.30.0 output
  bisect: fix output regressions in v2.30.0
  bisect: refactor bisect_run() to match CodingGuidelines
  bisect tests: test for v2.30.0 "bisect run" regressions
2022-12-14 15:55:45 +09:00
Junio C Hamano 96738bb0e1 Sync with 2.38.3 2022-12-13 21:25:15 +09:00
Junio C Hamano fea9f607a8 Sync with Git 2.37.5 2022-12-13 21:23:36 +09:00
Junio C Hamano 431f6e67e6 Merge branch 'maint-2.36' into maint-2.37 2022-12-13 21:20:35 +09:00
Junio C Hamano 8253c00421 Merge branch 'maint-2.35' into maint-2.36 2022-12-13 21:19:11 +09:00
Junio C Hamano fbabbc30e7 Merge branch 'maint-2.34' into maint-2.35 2022-12-13 21:17:10 +09:00
Junio C Hamano 3748b5b7f5 Merge branch 'maint-2.33' into maint-2.34 2022-12-13 21:15:22 +09:00
Junio C Hamano 5f22dcc02d Sync with Git 2.32.5 2022-12-13 21:13:11 +09:00
Junio C Hamano 32e357b6df Merge branch 'ps/attr-limits-with-fsck' into maint-2.32 2022-12-13 21:09:56 +09:00
Junio C Hamano 8a755eddf5 Sync with Git 2.31.6 2022-12-13 21:09:40 +09:00
Junio C Hamano 16128765d7 Git 2.30.7
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE4fA2sf7nIh/HeOzvsLXohpav5ssFAmOYaKAACgkQsLXohpav
 5svHThAAhjTaBBBYDM6FbHcFUHv515fSo04AmyXl6QKdjiBroaGj+WpJPrYM3B5G
 eR0MYfwDfp7rAvxPQwq1LzSQLz2G2Swue1a4X0t3Bomjpqf48OeAUleGHQlBUTm7
 wfZIHpgCbMBIHJtVAVPiEOo43ZJ1OareCwVpPOpAXLVgTU2Bbx59K0oUqGszjgE3
 anQ0kon6hELZ9aBTx80hUJaYWaxiUqENtRFs6vyOV/MKvW2KR+MJgvu/SQqbRJPy
 ndBJ5r0gcSbes0OLxKCAFFNVt2p6BeVb4IxyPogJveGwJsNU88DQnarSos7hvPYG
 DkhTzfpPmFJkP0WiRHr87jWCXNJraq1SmK65ac1CGV/NTrDfX9ZNoGIRFsHfLmw2
 1poTxhB/h0F4wCucZu7Wavvgd2NI2V+GK5dx8Mx5NovrC67smBny2W7kQgXJCdZX
 e6vNuKVK7pz3cVYvo5GbUo2ivY2igm9Xbj3Na1/Ie8wTFaZ0ZX+oRnxxAdwKbL/1
 X0VRUTQMgtrrLd24JCApo8r5+Ssg0HvIOpXcUZFpvaYl9kMltatwV1Y01lNAhAgF
 VFBvUWdFy5tGzPzSCd3w2NyZOJBng2GdKw9YUt/WVWCKeiLXLI3wh10pC+m1qJus
 HJwQbRRSzC4mhXlkKZ5IG+Xz7x+HrHFnLpQXhtjeSc5WwGQkE2w=
 =syKo
 -----END PGP SIGNATURE-----

Sync with Git 2.30.7
2022-12-13 21:02:20 +09:00
Simon Gerber 0918d08887 help.c: fix autocorrect in work tree for bare repository
Currently, auto correction doesn't work reliably for commands which must
run in a work tree (e.g. `git status`) in Git work trees which are
created from a bare repository.

As far as I'm able to determine, this has been broken since commit
659fef199f (help: use early config when autocorrecting aliases,
2017-06-14), where the call to `git_config()` in `help_unknown_cmd()`
was replaced with a call to `read_early_config()`. From what I can tell,
the actual cause for the unexpected error is that we call
`git_default_config()` in the `git_unknown_cmd_config` callback instead
of simply returning `0` for config entries which we aren't interested
in.

Calling `git_default_config()` in this callback to `read_early_config()`
seems like a bad idea since those calls will initialize a bunch of state
in `environment.c` (among other things `is_bare_repository_cfg`) before
we've properly detected that we're running in a work tree.

All other callbacks provided to `read_early_config()` appear to only
extract their configurations while simply returning `0` for all other
config keys.

This commit changes the `git_unknown_cmd_config` callback to not call
`git_default_config()`. Instead we also simply return `0` for config
keys which we're not interested in.

Additionally the commit adds a new test case covering `help.autocorrect`
in a work tree created from a bare clone.

Signed-off-by: Simon Gerber <gesimu@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13 10:01:53 +09:00
Johannes Schindelin a3795bf0e6 tests(mingw): avoid very slow mingw_test_cmp
When Git's test suite uses `test_cmp`, it is not actually trying to
compare binary files as the name `cmp` would suggest to users familiar
with Unix' tools, but the tests instead verify that actual output
matches the expected text.

On Unix, `cmp` works well enough for Git's purposes because only Line
Feed characters are used as line endings. However, on Windows, while
most tools accept Line Feeds as line endings, many tools produce
Carriage Return + Line Feed line endings, including some of the tools
used by the test suite (which are therefore provided via Git for Windows
SDK). Therefore, `cmp` would frequently fail merely due to different
line endings.

To accommodate for that, the `mingw_test_cmp` function was introduced
into Git's test suite to perform a line-by-line comparison that ignores
line endings. This function is a Bash function that is only used on
Windows, everywhere else `cmp` is used.

This is a double whammy because `cmp` is fast, and `mingw_test_cmp` is
slow, even more so on Windows because it is a Bash script function, and
Bash scripts are known to run particularly slowly on Windows due to
Bash's need for the POSIX emulation layer provided by the MSYS2 runtime.

The commit message of 32ed3314c1 (t5351: avoid using `test_cmp` for
binary data, 2022-07-29) provides an illuminating account of the
consequences: On Windows, the platform on which Git could really use all
the help it can get to improve its performance, the time spent on one
entire test script was reduced from half an hour to less than half a
minute merely by avoiding a single call to `mingw_test_cmp` in but a
single test case.

Learning the lesson to avoid shell scripting wherever possible, the Git
for Windows project implemented a minimal replacement for
`mingw_test_cmp` in the form of a `test-tool` subcommand that parses the
input files line by line, ignoring line endings, and compares them.
Essentially the same thing as `mingw_test_cmp`, but implemented in
C instead of Bash. This solution served the Git for Windows project
well, over years.

However, when this solution was finally upstreamed, the conclusion was
reached that a change to use `git diff --no-index` instead of
`mingw_test_cmp` was more easily reviewed and hence should be used
instead.

The reason why this approach was not even considered in Git for Windows
is that in 2007, there was already a motion on the table to use Git's
own diff machinery to perform comparisons in Git's test suite, but it
was dismissed in https://lore.kernel.org/git/xmqqbkrpo9or.fsf@gitster.g/
as undesirable because tests might potentially succeed due to bugs in
the diff machinery when they should not succeed, and those bugs could
therefore hide regressions that the tests try to prevent.

By the time Git for Windows' `mingw-test-cmp` in C was finally
contributed to the Git mailing list, reviewers agreed that the diff
machinery had matured enough and should be used instead.

When the concern was raised that the diff machinery, due to its
complexity, would perform substantially worse than the test helper
originally implemented in the Git for Windows project, a test
demonstrated that these performance differences are well lost within the
100+ minutes it takes to run Git's test suite on Windows.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13 07:18:06 +09:00
Victoria Dye 93a7bc8b28 rebase --update-refs: avoid unintended ref deletion
In b3b1a21d1a (sequencer: rewrite update-refs as user edits todo list,
2022-07-19), the 'todo_list_filter_update_refs()' step was added to handle
the removal of 'update-ref' lines from a 'rebase-todo'. Specifically, it
removes potential ref updates from the "update refs state" if a ref does not
have a corresponding 'update-ref' line.

However, because 'write_update_refs_state()' will not update the state if
the 'refs_to_oids' list was empty, removing *all* 'update-ref' lines will
result in the state remaining unchanged from how it was initialized (with
all refs' "after" OID being null). Then, when the ref update is applied, all
refs will be updated to null and consequently deleted.

To fix this, delete the 'update-refs' state file when 'refs_to_oids' is
empty. Additionally, add a tests covering "all update-ref lines removed"
cases.

Reported-by: herr.kaste <herr.kaste@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-12-09 19:31:45 +09:00
Patrick Steinhardt 27ab4784d5 fsck: implement checks for gitattributes
Recently, a vulnerability was reported that can lead to an out-of-bounds
write when reading an unreasonably large gitattributes file. The root
cause of this error are multiple integer overflows in different parts of
the code when there are either too many lines, when paths are too long,
when attribute names are too long, or when there are too many attributes
declared for a pattern.

As all of these are related to size, it seems reasonable to restrict the
size of the gitattributes file via git-fsck(1). This allows us to both
stop distributing known-vulnerable objects via common hosting platforms
that have fsck enabled, and users to protect themselves by enabling the
`fetch.fsckObjects` config.

There are basically two checks:

    1. We verify that size of the gitattributes file is smaller than
       100MB.

    2. We verify that the maximum line length does not exceed 2048
       bytes.

With the preceding commits, both of these conditions would cause us to
either ignore the complete gitattributes file or blob in the first case,
or the specific line in the second case. Now with these consistency
checks added, we also grow the ability to stop distributing such files
in the first place when `receive.fsckObjects` is enabled.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 17:07:04 +09:00
Junio C Hamano e0bfc0b3b9 Merge branch 'ps/attr-limits' into maint-2.32 2022-12-09 17:03:49 +09:00
Junio C Hamano 6662a836eb Merge branch 'ps/attr-limits' into maint-2.30 2022-12-09 16:05:52 +09:00
Patrick Steinhardt 304a50adff pretty: restrict input lengths for padding and wrapping formats
Both the padding and wrapping formatting directives allow the caller to
specify an integer that ultimately leads to us adding this many chars to
the result buffer. As a consequence, it is trivial to e.g. allocate 2GB
of RAM via a single formatting directive and cause resource exhaustion
on the machine executing this logic. Furthermore, it is debatable
whether there are any sane usecases that require the user to pad data to
2GB boundaries or to indent wrapped data by 2GB.

Restrict the input sizes to 16 kilobytes at a maximum to limit the
amount of bytes that can be requested by the user. This is not meant
as a fix because there are ways to trivially amplify the amount of
data we generate via formatting directives; the real protection is
achieved by the changes in previous steps to catch and avoid integer
wraparound that causes us to under-allocate and access beyond the
end of allocated memory reagions. But having such a limit
significantly helps fuzzing the pretty format, because the fuzzer is
otherwise quite fast to run out-of-memory as it discovers these
formatters.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:21 +09:00
Patrick Steinhardt 81c2d4c3a5 utf8: fix checking for glyph width in strbuf_utf8_replace()
In `strbuf_utf8_replace()`, we call `utf8_width()` to compute the width
of the current glyph. If the glyph is a control character though it can
be that `utf8_width()` returns `-1`, but because we assign this value to
a `size_t` the conversion will cause us to underflow. This bug can
easily be triggered with the following command:

    $ git log --pretty='format:xxx%<|(1,trunc)%x10'

>From all I can see though this seems to be a benign underflow that has
no security-related consequences.

Fix the bug by using an `int` instead. When we see a control character,
we now copy it into the target buffer but don't advance the current
width of the string.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:21 +09:00
Patrick Steinhardt 937b71cc8b utf8: fix overflow when returning string width
The return type of both `utf8_strwidth()` and `utf8_strnwidth()` is
`int`, but we operate on string lengths which are typically of type
`size_t`. This means that when the string is longer than `INT_MAX`, we
will overflow and thus return a negative result.

This can lead to an out-of-bounds write with `--pretty=format:%<1)%B`
and a commit message that is 2^31+1 bytes long:

    =================================================================
    ==26009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000001168 at pc 0x7f95c4e5f427 bp 0x7ffd8541c900 sp 0x7ffd8541c0a8
    WRITE of size 2147483649 at 0x603000001168 thread T0
        #0 0x7f95c4e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
        #1 0x5612bbb1068c in format_and_pad_commit pretty.c:1763
        #2 0x5612bbb1087a in format_commit_item pretty.c:1801
        #3 0x5612bbc33bab in strbuf_expand strbuf.c:429
        #4 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
        #5 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
        #6 0x5612bba0a4d5 in show_log log-tree.c:781
        #7 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
        #8 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
        #9 0x5612bb69235b in cmd_log_walk builtin/log.c:549
        #10 0x5612bb6951a2 in cmd_log builtin/log.c:883
        #11 0x5612bb56c993 in run_builtin git.c:466
        #12 0x5612bb56d397 in handle_builtin git.c:721
        #13 0x5612bb56db07 in run_argv git.c:788
        #14 0x5612bb56e8a7 in cmd_main git.c:923
        #15 0x5612bb803682 in main common-main.c:57
        #16 0x7f95c4c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #17 0x7f95c4c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #18 0x5612bb5680e4 in _start ../sysdeps/x86_64/start.S:115

    0x603000001168 is located 0 bytes to the right of 24-byte region [0x603000001150,0x603000001168)
    allocated by thread T0 here:
        #0 0x7f95c4ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x5612bbcdd556 in xrealloc wrapper.c:136
        #2 0x5612bbc310a3 in strbuf_grow strbuf.c:99
        #3 0x5612bbc32acd in strbuf_add strbuf.c:298
        #4 0x5612bbc33aec in strbuf_expand strbuf.c:418
        #5 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
        #6 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
        #7 0x5612bba0a4d5 in show_log log-tree.c:781
        #8 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
        #9 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
        #10 0x5612bb69235b in cmd_log_walk builtin/log.c:549
        #11 0x5612bb6951a2 in cmd_log builtin/log.c:883
        #12 0x5612bb56c993 in run_builtin git.c:466
        #13 0x5612bb56d397 in handle_builtin git.c:721
        #14 0x5612bb56db07 in run_argv git.c:788
        #15 0x5612bb56e8a7 in cmd_main git.c:923
        #16 0x5612bb803682 in main common-main.c:57
        #17 0x7f95c4c3c28f  (/usr/lib/libc.so.6+0x2328f)

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
    Shadow bytes around the buggy address:
      0x0c067fff81d0: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
      0x0c067fff81e0: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
      0x0c067fff81f0: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
      0x0c067fff8200: fd fd fd fa fa fa fd fd fd fd fa fa 00 00 00 fa
      0x0c067fff8210: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
    =>0x0c067fff8220: fd fa fa fa fd fd fd fa fa fa 00 00 00[fa]fa fa
      0x0c067fff8230: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8240: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8250: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8260: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8270: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==26009==ABORTING

Now the proper fix for this would be to convert both functions to return
an `size_t` instead of an `int`. But given that this commit may be part
of a security release, let's instead do the minimal viable fix and die
in case we see an overflow.

Add a test that would have previously caused us to crash.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:21 +09:00
Patrick Steinhardt 17d23e8a38 utf8: fix returning negative string width
The `utf8_strnwidth()` function calls `utf8_width()` in a loop and adds
its returned width to the end result. `utf8_width()` can return `-1`
though in case it reads a control character, which means that the
computed string width is going to be wrong. In the worst case where
there are more control characters than non-control characters, we may
even return a negative string width.

Fix this bug by treating control characters as having zero width.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:21 +09:00
Patrick Steinhardt 48050c42c7 pretty: fix integer overflow in wrapping format
The `%w(width,indent1,indent2)` formatting directive can be used to
rewrap text to a specific width and is designed after git-shortlog(1)'s
`-w` parameter. While the three parameters are all stored as `size_t`
internally, `strbuf_add_wrapped_text()` accepts integers as input. As a
result, the casted integers may overflow. As these now-negative integers
are later on passed to `strbuf_addchars()`, we will ultimately run into
implementation-defined behaviour due to casting a negative number back
to `size_t` again. On my platform, this results in trying to allocate
9000 petabyte of memory.

Fix this overflow by using `cast_size_t_to_int()` so that we reject
inputs that cannot be represented as an integer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:21 +09:00
Patrick Steinhardt 1de69c0cdd pretty: fix adding linefeed when placeholder is not expanded
When a formatting directive has a `+` or ` ` after the `%`, then we add
either a line feed or space if the placeholder expands to a non-empty
string. In specific cases though this logic doesn't work as expected,
and we try to add the character even in the case where the formatting
directive is empty.

One such pattern is `%w(1)%+d%+w(2)`. `%+d` expands to reference names
pointing to a certain commit, like in `git log --decorate`. For a tagged
commit this would for example expand to `\n (tag: v1.0.0)`, which has a
leading newline due to the `+` modifier and a space added by `%d`. Now
the second wrapping directive will cause us to rewrap the text to
`\n(tag:\nv1.0.0)`, which is one byte shorter due to the missing leading
space. The code that handles the `+` magic now notices that the length
has changed and will thus try to insert a leading line feed at the
original posititon. But as the string was shortened, the original
position is past the buffer's boundary and thus we die with an error.

Now there are two issues here:

    1. We check whether the buffer length has changed, not whether it
       has been extended. This causes us to try and add the character
       past the string boundary.

    2. The current logic does not make any sense whatsoever. When the
       string got expanded due to the rewrap, putting the separator into
       the original position is likely to put it somewhere into the
       middle of the rewrapped contents.

It is debatable whether `%+w()` makes any sense in the first place.
Strictly speaking, the placeholder never expands to a non-empty string,
and consequentially we shouldn't ever accept this combination. We thus
fix the bug by simply refusing `%+w()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:21 +09:00
Patrick Steinhardt f6e0b9f389 pretty: fix out-of-bounds read when parsing invalid padding format
An out-of-bounds read can be triggered when parsing an incomplete
padding format string passed via `--pretty=format` or in Git archives
when files are marked with the `export-subst` gitattribute.

This bug exists since we have introduced support for truncating output
via the `trunc` keyword a7f01c6b4d (pretty: support truncating in %>, %<
and %><, 2013-04-19). Before this commit, we used to find the end of the
formatting string by using strchr(3P). This function returns a `NULL`
pointer in case the character in question wasn't found. The subsequent
check whether any character was found thus simply checked the returned
pointer. After the commit we switched to strcspn(3P) though, which only
returns the offset to the first found character or to the trailing NUL
byte. As the end pointer is now computed by adding the offset to the
start pointer it won't be `NULL` anymore, and as a consequence the check
doesn't do anything anymore.

The out-of-bounds data that is being read can in fact end up in the
formatted string. As a consequence, it is possible to leak memory
contents either by calling git-log(1) or via git-archive(1) when any of
the archived files is marked with the `export-subst` gitattribute.

    ==10888==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000398 at pc 0x7f0356047cb2 bp 0x7fff3ffb95d0 sp 0x7fff3ffb8d78
    READ of size 1 at 0x602000000398 thread T0
        #0 0x7f0356047cb1 in __interceptor_strchrnul /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725
        #1 0x563b7cec9a43 in strbuf_expand strbuf.c:417
        #2 0x563b7cda7060 in repo_format_commit_message pretty.c:1869
        #3 0x563b7cda8d0f in pretty_print_commit pretty.c:2161
        #4 0x563b7cca04c8 in show_log log-tree.c:781
        #5 0x563b7cca36ba in log_tree_commit log-tree.c:1117
        #6 0x563b7c927ed5 in cmd_log_walk_no_free builtin/log.c:508
        #7 0x563b7c92835b in cmd_log_walk builtin/log.c:549
        #8 0x563b7c92b1a2 in cmd_log builtin/log.c:883
        #9 0x563b7c802993 in run_builtin git.c:466
        #10 0x563b7c803397 in handle_builtin git.c:721
        #11 0x563b7c803b07 in run_argv git.c:788
        #12 0x563b7c8048a7 in cmd_main git.c:923
        #13 0x563b7ca99682 in main common-main.c:57
        #14 0x7f0355e3c28f  (/usr/lib/libc.so.6+0x2328f)
        #15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115

    0x602000000398 is located 0 bytes to the right of 8-byte region [0x602000000390,0x602000000398)
    allocated by thread T0 here:
        #0 0x7f0356072faa in __interceptor_strdup /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:439
        #1 0x563b7cf7317c in xstrdup wrapper.c:39
        #2 0x563b7cd9a06a in save_user_format pretty.c:40
        #3 0x563b7cd9b3e5 in get_commit_format pretty.c:173
        #4 0x563b7ce54ea0 in handle_revision_opt revision.c:2456
        #5 0x563b7ce597c9 in setup_revisions revision.c:2850
        #6 0x563b7c9269e0 in cmd_log_init_finish builtin/log.c:269
        #7 0x563b7c927362 in cmd_log_init builtin/log.c:348
        #8 0x563b7c92b193 in cmd_log builtin/log.c:882
        #9 0x563b7c802993 in run_builtin git.c:466
        #10 0x563b7c803397 in handle_builtin git.c:721
        #11 0x563b7c803b07 in run_argv git.c:788
        #12 0x563b7c8048a7 in cmd_main git.c:923
        #13 0x563b7ca99682 in main common-main.c:57
        #14 0x7f0355e3c28f  (/usr/lib/libc.so.6+0x2328f)
        #15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725 in __interceptor_strchrnul
    Shadow bytes around the buggy address:
      0x0c047fff8020: fa fa fd fd fa fa 00 06 fa fa 05 fa fa fa fd fd
      0x0c047fff8030: fa fa 00 02 fa fa 06 fa fa fa 05 fa fa fa fd fd
      0x0c047fff8040: fa fa 00 07 fa fa 03 fa fa fa fd fd fa fa 00 00
      0x0c047fff8050: fa fa 00 01 fa fa fd fd fa fa 00 00 fa fa 00 01
      0x0c047fff8060: fa fa 00 06 fa fa 00 06 fa fa 05 fa fa fa 05 fa
    =>0x0c047fff8070: fa fa 00[fa]fa fa fd fa fa fa fd fd fa fa fd fd
      0x0c047fff8080: fa fa fd fd fa fa 00 00 fa fa 00 fa fa fa fd fa
      0x0c047fff8090: fa fa fd fd fa fa 00 00 fa fa fa fa fa fa fa fa
      0x0c047fff80a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff80b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff80c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==10888==ABORTING

Fix this bug by checking whether `end` points at the trailing NUL byte.
Add a test which catches this out-of-bounds read and which demonstrates
that we used to write out-of-bounds data into the formatted message.

Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Original-patch-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:21 +09:00
Patrick Steinhardt b49f309aa1 pretty: fix out-of-bounds read when left-flushing with stealing
With the `%>>(<N>)` pretty formatter, you can ask git-log(1) et al to
steal spaces. To do so we need to look ahead of the next token to see
whether there are spaces there. This loop takes into account ANSI
sequences that end with an `m`, and if it finds any it will skip them
until it finds the first space. While doing so it does not take into
account the buffer's limits though and easily does an out-of-bounds
read.

Add a test that hits this behaviour. While we don't have an easy way to
verify this, the test causes the following failure when run with
`SANITIZE=address`:

    ==37941==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000000baf at pc 0x55ba6f88e0d0 bp 0x7ffc84c50d20 sp 0x7ffc84c50d10
    READ of size 1 at 0x603000000baf thread T0
        #0 0x55ba6f88e0cf in format_and_pad_commit pretty.c:1712
        #1 0x55ba6f88e7b4 in format_commit_item pretty.c:1801
        #2 0x55ba6f9b1ae4 in strbuf_expand strbuf.c:429
        #3 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869
        #4 0x55ba6f890ccf in pretty_print_commit pretty.c:2161
        #5 0x55ba6f7884c8 in show_log log-tree.c:781
        #6 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117
        #7 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508
        #8 0x55ba6f41035b in cmd_log_walk builtin/log.c:549
        #9 0x55ba6f4131a2 in cmd_log builtin/log.c:883
        #10 0x55ba6f2ea993 in run_builtin git.c:466
        #11 0x55ba6f2eb397 in handle_builtin git.c:721
        #12 0x55ba6f2ebb07 in run_argv git.c:788
        #13 0x55ba6f2ec8a7 in cmd_main git.c:923
        #14 0x55ba6f581682 in main common-main.c:57
        #15 0x7f2d08c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #16 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #17 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115

    0x603000000baf is located 1 bytes to the left of 24-byte region [0x603000000bb0,0x603000000bc8)
    allocated by thread T0 here:
        #0 0x7f2d08ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x55ba6fa5b494 in xrealloc wrapper.c:136
        #2 0x55ba6f9aefdc in strbuf_grow strbuf.c:99
        #3 0x55ba6f9b0a06 in strbuf_add strbuf.c:298
        #4 0x55ba6f9b1a25 in strbuf_expand strbuf.c:418
        #5 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869
        #6 0x55ba6f890ccf in pretty_print_commit pretty.c:2161
        #7 0x55ba6f7884c8 in show_log log-tree.c:781
        #8 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117
        #9 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508
        #10 0x55ba6f41035b in cmd_log_walk builtin/log.c:549
        #11 0x55ba6f4131a2 in cmd_log builtin/log.c:883
        #12 0x55ba6f2ea993 in run_builtin git.c:466
        #13 0x55ba6f2eb397 in handle_builtin git.c:721
        #14 0x55ba6f2ebb07 in run_argv git.c:788
        #15 0x55ba6f2ec8a7 in cmd_main git.c:923
        #16 0x55ba6f581682 in main common-main.c:57
        #17 0x7f2d08c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #18 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #19 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115

    SUMMARY: AddressSanitizer: heap-buffer-overflow pretty.c:1712 in format_and_pad_commit
    Shadow bytes around the buggy address:
      0x0c067fff8120: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
      0x0c067fff8130: fd fd fa fa fd fd fd fd fa fa fd fd fd fa fa fa
      0x0c067fff8140: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
      0x0c067fff8150: fa fa fd fd fd fd fa fa 00 00 00 fa fa fa fd fd
      0x0c067fff8160: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
    =>0x0c067fff8170: fd fd fd fa fa[fa]00 00 00 fa fa fa 00 00 00 fa
      0x0c067fff8180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8190: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff81a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff81b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff81c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb

Luckily enough, this would only cause us to copy the out-of-bounds data
into the formatted commit in case we really had an ANSI sequence
preceding our buffer. So this bug likely has no security consequences.

Fix it regardless by not traversing past the buffer's start.

Reported-by: Patrick Steinhardt <ps@pks.im>
Reported-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:21 +09:00
Patrick Steinhardt 81dc898df9 pretty: fix out-of-bounds write caused by integer overflow
When using a padding specifier in the pretty format passed to git-log(1)
we need to calculate the string length in several places. These string
lengths are stored in `int`s though, which means that these can easily
overflow when the input lengths exceeds 2GB. This can ultimately lead to
an out-of-bounds write when these are used in a call to memcpy(3P):

        ==8340==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1ec62f97fe at pc 0x7f2127e5f427 bp 0x7ffd3bd63de0 sp 0x7ffd3bd63588
    WRITE of size 1 at 0x7f1ec62f97fe thread T0
        #0 0x7f2127e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
        #1 0x5628e96aa605 in format_and_pad_commit pretty.c:1762
        #2 0x5628e96aa7f4 in format_commit_item pretty.c:1801
        #3 0x5628e97cdb24 in strbuf_expand strbuf.c:429
        #4 0x5628e96ab060 in repo_format_commit_message pretty.c:1869
        #5 0x5628e96acd0f in pretty_print_commit pretty.c:2161
        #6 0x5628e95a44c8 in show_log log-tree.c:781
        #7 0x5628e95a76ba in log_tree_commit log-tree.c:1117
        #8 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508
        #9 0x5628e922c35b in cmd_log_walk builtin/log.c:549
        #10 0x5628e922f1a2 in cmd_log builtin/log.c:883
        #11 0x5628e9106993 in run_builtin git.c:466
        #12 0x5628e9107397 in handle_builtin git.c:721
        #13 0x5628e9107b07 in run_argv git.c:788
        #14 0x5628e91088a7 in cmd_main git.c:923
        #15 0x5628e939d682 in main common-main.c:57
        #16 0x7f2127c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #17 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #18 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115

    0x7f1ec62f97fe is located 2 bytes to the left of 4831838265-byte region [0x7f1ec62f9800,0x7f1fe62f9839)
    allocated by thread T0 here:
        #0 0x7f2127ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x5628e98774d4 in xrealloc wrapper.c:136
        #2 0x5628e97cb01c in strbuf_grow strbuf.c:99
        #3 0x5628e97ccd42 in strbuf_addchars strbuf.c:327
        #4 0x5628e96aa55c in format_and_pad_commit pretty.c:1761
        #5 0x5628e96aa7f4 in format_commit_item pretty.c:1801
        #6 0x5628e97cdb24 in strbuf_expand strbuf.c:429
        #7 0x5628e96ab060 in repo_format_commit_message pretty.c:1869
        #8 0x5628e96acd0f in pretty_print_commit pretty.c:2161
        #9 0x5628e95a44c8 in show_log log-tree.c:781
        #10 0x5628e95a76ba in log_tree_commit log-tree.c:1117
        #11 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508
        #12 0x5628e922c35b in cmd_log_walk builtin/log.c:549
        #13 0x5628e922f1a2 in cmd_log builtin/log.c:883
        #14 0x5628e9106993 in run_builtin git.c:466
        #15 0x5628e9107397 in handle_builtin git.c:721
        #16 0x5628e9107b07 in run_argv git.c:788
        #17 0x5628e91088a7 in cmd_main git.c:923
        #18 0x5628e939d682 in main common-main.c:57
        #19 0x7f2127c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #20 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #21 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
    Shadow bytes around the buggy address:
      0x0fe458c572a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    =>0x0fe458c572f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
      0x0fe458c57300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==8340==ABORTING

The pretty format can also be used in `git archive` operations via the
`export-subst` attribute. So this is what in our opinion makes this a
critical issue in the context of Git forges which allow to download an
archive of user supplied Git repositories.

Fix this vulnerability by using `size_t` instead of `int` to track the
string lengths. Add tests which detect this vulnerability when Git is
compiled with the address sanitizer.

Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Original-patch-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Modified-by: Taylor  Blau <me@ttalorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:21 +09:00
Carlo Marcelo Arenas Belón a244dc5b0a test-lib: add prerequisite for 64-bit platforms
Allow tests that assume a 64-bit `size_t` to be skipped in 32-bit
platforms and regardless of the size of `long`.

This imitates the `LONG_IS_64BIT` prerequisite.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-09 14:26:04 +09:00
Eric Sunshine 35c194dc57 t1509: facilitate repeated script invocations
t1509-root-work-tree.sh, which tests behavior of a Git repository
located at the root `/` directory, refuses to run if it detects the
presence of an existing repository at `/`. This safeguard ensures that
it won't clobber a legitimate repository at that location. However,
because t1509 does a poor job of cleaning up after itself, it runs afoul
of its own safety check on subsequent runs, which makes it painful to
run the script repeatedly since each run requires manual cleanup of
detritus from the previous run.

Address this shortcoming by making t1509 clean up after itself as its
last action. This is safe since the script can only make it to this
cleanup action if it did not find a legitimate repository at `/` in the
first place, so the resources cleaned up here can only have been created
by the script itself.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
2022-12-09 10:41:59 +09:00
Eric Sunshine ce153b8d4d t1509: make "setup" test more robust
One of the t1509 setup tests is very particular about the output it
expects from `git init`, and fails if the output differs even slightly
which can happen easily if the script is run multiple times since it
doesn't do a good job of cleaning up after itself (i.e. it leaves
detritus in the root directory `/`). One bit of cruft in particular
(`/HEAD`) makes the test fail since its presence causes `git init` to
alter its output; rather than reporting "Initialized empty Git
repository", it instead reports "Reinitialized existing Git repository"
when `/HEAD` is present. Address this problem by making the test do a
more careful job of crafting its intended initial state.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
2022-12-09 10:41:58 +09:00
Eric Sunshine 7790b8c6b5 t1509: fix failing "root work tree" test due to owner-check
When 8959555cee (setup_git_directory(): add an owner check for the
top-level directory, 2022-03-02) tightened security surrounding
directory ownership, it neglected to adjust t1509-root-work-tree.sh to
take the new restriction into account. As a result, since the root
directory `/` is typically not owned by the user running the test
(indeed, t1509 refuses to run as `root`), the ownership check added
by 8959555cee kicks in and causes the test to fail:

    fatal: detected dubious ownership in repository at '/'
    To add an exception for this directory, call:

        git config --global --add safe.directory /

This problem went unnoticed for so long because t1509 is rarely run
since it requires setting up a `chroot` environment or a sacrificial
virtual machine in which `/` can be made writable and polluted by any
user.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
2022-12-09 10:41:58 +09:00
René Scharfe 86325d36e6 t3920: support CR-eating grep
grep(1) converts CRLF line endings to LF on current MinGW:

   $ uname -sr
   MINGW64_NT-10.0-22621 3.3.6-341.x86_64

   $ printf 'a\r\n' | hexdump.exe -C
   00000000  61 0d 0a                                          |a..|
   00000003

   $ printf 'a\r\n' | grep . | hexdump.exe -C
   00000000  61 0a                                             |a.|
   00000002

Create the intended test file by grepping the original file with LF
line endings and adding CRs explicitly.

The missing CRs went unnoticed because test_cmp on MinGW ignores line
endings since 4d715ac05c (Windows: a test_cmp that is agnostic to random
LF <> CRLF conversions, 2013-10-26).  Fix this test anyway to avoid
depending on that special test_cmp behavior, especially since this is
the only test that needs it.

Piping the output of grep(1) through append_cr has the side-effect of
ignoring its return value.  That means we no longer need the explicit
"|| true" to support commit messages without a body.

Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-07 13:33:18 +09:00
Johannes Schindelin 95494c6f61 t0021: use Windows-friendly pwd
In Git for Windows, when passing paths from shell scripts to regular
Win32 executables, thanks to the MSYS2 runtime a somewhat magic path
conversion happens that lets the shell script think that there is a file
at `/git/Makefile` and the Win32 process it spawned thinks that the
shell script said `C:/git-sdk-64/git/Makefile` instead.

This conversion is documented in detail over here:
https://www.msys2.org/docs/filesystem-paths/#automatic-unix-windows-path-conversion

As all automatic conversions, there are gaps. For example, to avoid
mistaking command-line options like `/LOG=log.txt` (which are quite
common in the Windows world) from being mistaken for a Unix-style
absolute path, the MSYS2 runtime specifically exempts arguments
containing a `=` character from that conversion.

We are about to change `test_cmp` to use `git diff --no-index`, which
involves spawning precisely such a Win32 process.

In combination, this would cause a failure in `t0021-conversion.sh`
where we pass an absolute path containing an equal character to the
`test_cmp` function.

Seeing as the Unix tools like `cp` and `diff` that are used by Git's
test suite in the Git for Windows SDK (thanks to the MSYS2 project)
understand both Unix-style as well as Windows-style paths, we can stave
off this problem by simply switching to Windows-style paths and
side-stepping the need for any automatic path conversion.

Note: The `PATH` variable is obviously special, as it is colon-separated
in the MSYS2 Bash used by Git for Windows, and therefore _cannot_
contain absolute Windows-style paths, lest the colon after the drive
letter is mistaken for a path separator. Therefore, we need to be
careful to keep the Unix-style when modifying the `PATH` variable.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-07 13:22:58 +09:00
Patrick Steinhardt 3c50032ff5 attr: ignore overly large gitattributes files
Similar as with the preceding commit, start ignoring gitattributes files
that are overly large to protect us against out-of-bounds reads and
writes caused by integer overflows. Unfortunately, we cannot just define
"overly large" in terms of any preexisting limits in the codebase.

Instead, we choose a very conservative limit of 100MB. This is plenty of
room for specifying gitattributes, and incidentally it is also the limit
for blob sizes for GitHub. While we don't want GitHub to dictate limits
here, it is still sensible to use this fact for an informed decision
given that it is hosting a huge set of repositories. Furthermore, over
at GitLab we scanned a subset of repositories for their root-level
attribute files. We found that 80% of them have a gitattributes file
smaller than 100kB, 99.99% have one smaller than 1MB, and only a single
repository had one that was almost 3MB in size. So enforcing a limit of
100MB seems to give us ample of headroom.

With this limit in place we can be reasonably sure that there is no easy
way to exploit the gitattributes file via integer overflows anymore.
Furthermore, it protects us against resource exhaustion caused by
allocating the in-memory data structures required to represent the
parsed attributes.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 15:50:03 +09:00
Patrick Steinhardt dfa6b32b5e attr: ignore attribute lines exceeding 2048 bytes
There are two different code paths to read gitattributes: once via a
file, and once via the index. These two paths used to behave differently
because when reading attributes from a file, we used fgets(3P) with a
buffer size of 2kB. Consequentially, we silently truncate line lengths
when lines are longer than that and will then parse the remainder of the
line as a new pattern. It goes without saying that this is entirely
unexpected, but it's even worse that the behaviour depends on how the
gitattributes are parsed.

While this is simply wrong, the silent truncation saves us with the
recently discovered vulnerabilities that can cause out-of-bound writes
or reads with unreasonably long lines due to integer overflows. As the
common path is to read gitattributes via the worktree file instead of
via the index, we can assume that any gitattributes file that had lines
longer than that is already broken anyway. So instead of lifting the
limit here, we can double down on it to fix the vulnerabilities.

Introduce an explicit line length limit of 2kB that is shared across all
paths that read attributes and ignore any line that hits this limit
while printing a warning.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 15:33:07 +09:00
Patrick Steinhardt d74b1fd54f attr: fix silently splitting up lines longer than 2048 bytes
When reading attributes from a file we use fgets(3P) with a buffer size
of 2048 bytes. This means that as soon as a line exceeds the buffer size
we split it up into multiple parts and parse each of them as a separate
pattern line. This is of course not what the user intended, and even
worse the behaviour is inconsistent with how we read attributes from the
index.

Fix this bug by converting the code to use `strbuf_getline()` instead.
This will indeed read in the whole line, which may theoretically lead to
an out-of-memory situation when the gitattributes file is huge. We're
about to reject any gitattributes files larger than 100MB in the next
commit though, which makes this less of a concern.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 15:29:30 +09:00
Johannes Sixt 500317ae03 t3920: don't ignore errors of more than one command with || true
It is customary to write `A || true` to ignore a potential error exit of
command A. But when we have a sequence `A && B && C || true && D`, then
a failure of any of A, B, or C skips to D right away. This is not
intended here. Turn the command whose failure is to be ignored into a
compound command to ensure it is the only one that is allowed to fail.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 10:02:34 +09:00
Ævar Arnfjörð Bjarmason 5f3bfdc4f3 t4023: fix ignored exit codes of git
Change a "git diff-tree" command to be &&-chained so that we won't
ignore its exit code, see the ea05fd5fbf (Merge branch
'ab/keep-git-exit-codes-in-tests', 2022-03-16) topic for prior art.

This fixes code added in b45563a229 (rename: Break filepairs with
different types., 2007-11-30). Due to hiding the exit code we hid a
memory leak under SANITIZE=leak.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 09:28:04 +09:00
Ævar Arnfjörð Bjarmason 4d81ce1b99 t7600: don't ignore "rev-parse" exit code in helper
Change the verify_mergeheads() helper the check the exit code of "git
rev-parse".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 09:27:32 +09:00
Ævar Arnfjörð Bjarmason 243caa8982 t5314: check exit code of "git"
Amend the test added in [1] to check the exit code of the "git"
invocations. An in-flight change[2] introduced a memory leak in these
invocations, which went undetected unless we were running under
"GIT_TEST_SANITIZE_LEAK_LOG=true".

Note that the in-flight change made 8 test files fail, but as far as I
can tell only this one would have had its exit code hidden unless
under "GIT_TEST_SANITIZE_LEAK_LOG=true". The rest would be caught
without it.

We could pick other variable names here than "ln%d", e.g. "commit",
"dummy_blob" and "file_blob", but having the "rev-parse" invocations
aligned makes the difference between them more readable, so let's pick
"ln%d".

1. 4cf2143e02 (pack-objects: break delta cycles before delta-search
   phase, 2016-08-11)
2. https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
3. faececa53f (test-lib: have the "check" mode for SANITIZE=leak
   consider leak logs, 2022-07-28)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-02 16:38:12 +09:00
René Scharfe 77e04b2ed4 t4205: don't exit test script on failure
Only abort the individual check instead of exiting the whole test script
if git show fails.  Noticed with GIT_TEST_PASSING_SANITIZE_LEAK=check.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-02 08:25:02 +09:00