Commit graph

20968 commits

Author SHA1 Message Date
Teng Long 425b4d7f47 push: introduce '--branches' option
The '--all' option of git-push built-in cmd support to push all branches
(refs under refs/heads) to remote. Under the usage, a user can easlily
work in some scenarios, for example, branches synchronization and batch
upload.

The '--all' was introduced for a long time, meanwhile, git supports to
customize the storage location under "refs/". when a new git user see
the usage like, 'git push origin --all', we might feel like we're
pushing _all_ the refs instead of just branches without looking at the
documents until we found the related description of it or '--mirror'.

To ensure compatibility, we cannot rename '--all' to another name
directly, one way is, we can try to add a new option '--heads' which be
identical with the functionality of '--all' to let the user understand
the meaning of representation more clearly. Actually, We've more or less
named options this way already, for example, in 'git-show-ref' and 'git
ls-remote'.

At the same time, we fix a related issue about the wrong help
information of '--all' option in code and add some test cases in
t5523, t5543 and t5583.

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-06 14:36:43 -07:00
John Cai 44451a2e5e attr: teach "--attr-source=<tree>" global option to "git"
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=<tree>" option to
allow it to read attribute files from a tree-ish, but did so only
for the command.  Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.

Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=<tree>" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.

Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-06 14:34:09 -07:00
Junio C Hamano b7cf25c8f4 t9800: correct misuse of 'show -s --raw' in a test
There is $(git show -s --raw --pretty=format:%at HEAD) in this test
that is meant to grab the author time of the commit.  We used to
have a bug in the command line option parser of the diff family of
commands, where "show -s --raw" was identical to "show -s".

With the "-s" bug fixed, "show -s --raw" would mean the same thing
as "show --raw", i.e. show the output from the diff machinery in the
"raw" format.  And this test will start failing, so fix it before
that happens.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-06 14:30:51 -07:00
Junio C Hamano 83973981eb diff: plug leaks in dirstat
The array of dirstat_file contained in the dirstat_dir structure is
not freed after the processing ends.  Unfortunately, the member that
points at the array, .files, is incremented as the gather_dirstat()
function recursively walks it, and this needs to be plugged by
remembering the beginning of the array before gather_dirstat() mucks
with it and freeing it after we are done.

We can mark t4047 as leak-free.  t4000, which is marked as
leak-free, now can exercise dirstat in it, which will happen next.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-05 14:24:09 -07:00
Andrei Rybak a5855fd8d4 t2019: don't create unused files
Tests in t2019-checkout-ambiguous-ref.sh redirect two invocations of
"git checkout" to files "stdout" and "stderr".  Several assertions are
made using file "stderr".  File "stdout", however, is unused.

Don't redirect standard output of "git checkout" to file "stdout" in
t2019-checkout-ambiguous-ref.sh to avoid creating unnecessary files.

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-03 08:53:10 -07:00
Andrei Rybak dca675c6ef t1502: don't create unused files
Three tests in file t1502-rev-parse-parseopt.sh use three redirections
with invocation of "git rev-parse --parseopt --".  All three tests
redirect standard output to file "out" and file "spec" to standard
input.  Two of the tests redirect standard output a second time to file
"actual", and the third test redirects standard error to file "err".
These tests check contents of files "actual" and "err", but don't use
the files named "out" for assertions.  The two tests that redirect to
standard output twice might also be confusing to the reader.

Don't redirect standard output of "git rev-parse" to file "out" in
t1502-rev-parse-parseopt.sh to avoid creating unnecessary files.

Acked-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-03 08:53:06 -07:00
Andrei Rybak 59162ece57 t1450: don't create unused files
Test 'fsck error and recovery on invalid object type' in file
t1450-fsck.sh redirects output of a failing "git fsck" invocation to
files "out" and "err" to assert presence of error messages in the output
of the command.  Commit 31deb28f5e (fsck: don't hard die on invalid
object types, 2021-10-01) changed the way assertions in this test are
performed.  The test doesn't compare the whole standard error with
prepared file "err.expect" and it doesn't assert that standard output is
empty.

Don't create unused files "err.expect" and "out" in test 'fsck error and
recovery on invalid object type'.

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-03 08:53:03 -07:00
Andrei Rybak a7cae2905b t1300: don't create unused files
Three tests in t1300-config.sh check that "git config --get" barfs when
syntax errors are present in the config file.  The tests redirect
standard output and standard error of "git config --get" to files,
"actual" and "error" correspondingly.  They assert presence of an error
message in file "error".  However, these tests don't use file "actual"
for assertions.

Don't redirect standard output of "git config --get" to file "actual" in
t1300-config.sh to avoid creating unnecessary files.

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-03 08:52:48 -07:00
Andrei Rybak 6fc68e7ca3 t1300: fix config file syntax error descriptions
Three tests in t1300-config.sh check that "git config --get" barfs when
the config file contains various syntax errors: key=value pair without
equals sign, broken section line, and broken value string.  The sample
config files include a comment describing the kind of broken syntax.
This description seems to have been copy-pasted from the "broken section
line" sample to the other two samples.

Fix descriptions of broken config file syntax in samples used in
t1300-config.sh.

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-03 08:52:45 -07:00
Andrei Rybak ed5288cff2 t0300: don't create unused file
Test 'credential config with partial URLs' in t0300-credentials.sh
contains three "git credential fill" invocations.  For two of the
invocations, the test asserts presence or absence of string "yep" in the
standard output.  For the third test it checks for an error message in
standard error.

Don't redirect standard output of "git credential" to file "stdout" in
t0300-credentials.sh to avoid creating an unnecessary file when only
standard error is checked.

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-03 08:52:17 -07:00
Junio C Hamano d699e27bd4 Merge branch 'tb/ban-strtok'
Mark strtok() and strtok_r() to be banned.

* tb/ban-strtok:
  banned.h: mark `strtok()` and `strtok_r()` as banned
  t/helper/test-json-writer.c: avoid using `strtok()`
  t/helper/test-oidmap.c: avoid using `strtok()`
  t/helper/test-hashmap.c: avoid using `strtok()`
  string-list: introduce `string_list_setlen()`
  string-list: multi-delimiter `string_list_split_in_place()`
2023-05-02 10:13:35 -07:00
Junio C Hamano cf85f4b3bd Merge branch 'jk/blame-fake-commit-label'
The output given by "git blame" that attributes a line to contents
taken from the file specified by the "--contents" option shows it
differently from a line attributed to the working tree file.

* jk/blame-fake-commit-label:
  blame: use different author name for fake commit generated by --contents
2023-05-02 10:13:35 -07:00
René Scharfe 31885f64e9 test-ctype: check EOF
The character classifiers are supposed to allow passing EOF to them, a
negative value.  It isn't part of any character class.  Extend the tests
to cover that.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-02 09:25:54 -07:00
Derrick Stolee 756f1bcd29 fsck: verify checksums of all .bitmap files
If a filesystem-level corruption occurs in a .bitmap file, Git can react
poorly. This could take the form of a run-time error due to failing to
parse an EWAH bitmap or be more subtle such as returning the wrong set
of objects to a fetch or clone.

A natural first response to either of these kinds of errors is to run
'git fsck' to see if any files are corrupt. This currently ignores all
.bitmap files.

Add checks to 'git fsck' for all .bitmap files that are currently
associated with a multi-pack-index or pack file. Verify their checksums
using the hashfile API.

We iterate through all multi-pack-indexes and pack-files to be sure to
check all .bitmap files, not just the one that would be read by the
process. For example, a multi-pack-index bitmap overrules a pack-bitmap.
However, if the multi-pack-index is removed, the pack-bitmap may be
selected instead. Be thorough to include every file that could become
active in such a way. This includes checking files in alternates.

There is potential that we could extend this effort to check the
structure of the reachability bitmaps themselves, but it is very
expensive to do so. At minimum, it's as expensive as generating the
bitmaps in the first place, and that's assuming that we don't use the
trivial algorithm of verifying each bitmap individually. The trivial
algorithm will result in quadratic behavior (number of objects times
number of bitmapped commits) while the bitmap building operation
constructs a lattice of commits to build bitmaps incrementally and then
generate the final bitmaps from a subset of those commits.

If we were to extend 'git fsck' to check .bitmap file contents more
closely like this, then we would likely want to hide it behind an option
that signals the user is more willing to do expensive operations such as
this.

For testing, set up a repository with a pack-bitmap _and_ a
multi-pack-index bitmap. This requires some file movement to avoid
deleting the pack-bitmap during the repack that creates the
multi-pack-index bitmap. We can then verify that 'git fsck' is checking
all files, not just the "active" bitmap.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-02 08:48:22 -07:00
Glen Choo e35f202b45 setup: trace bare repository setups
safe.bareRepository=explicit is a safer default mode of operation, since
it guards against the embedded bare repository attack [1]. Most end
users don't use bare repositories directly, so they should be able to
set safe.bareRepository=explicit, with the expectation that they can
reenable bare repositories by specifying GIT_DIR or --git-dir.

However, the user might use a tool that invokes Git on bare repositories
without setting GIT_DIR (e.g. "go mod" will clone bare repositories
[2]), so even if a user wanted to use safe.bareRepository=explicit, it
wouldn't be feasible until their tools learned to set GIT_DIR.

To make this transition easier, add a trace message to note when we
attempt to set up a bare repository without setting GIT_DIR. This allows
users and tool developers to audit which of their tools are problematic
and report/fix the issue.  When they are sufficiently confident, they
would switch over to "safe.bareRepository=explicit".

Note that this uses trace2_data_string(), which isn't supported by the
"normal" GIT_TRACE2 target, only _EVENT or _PERF.

[1] https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com/
[2] https://go.dev/ref/mod

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-01 11:20:33 -07:00
Taylor Blau 71201ab0e5 t/lib-credential.sh: ensure credential helpers handle long headers
Add a test ensuring that the "wwwauth[]" field cannot be used to
inject malicious data into the credential helper stream.

Many of the credential helpers in contrib/credential read the
newline-delimited protocol stream one line at a time by repeatedly
calling fgets() into a fixed-size buffer.

This assumes that each line is no more than 1024 characters long, since
each iteration of the loop assumes that it is parsing starting at the
beginning of a new line in the stream. However, similar to a5bb10fd5e
(config: avoid fixed-sized buffer when renaming/deleting a section,
2023-04-06), if a line is longer than 1024 characters, a malicious actor
can embed another command within an existing line, bypassing the usual
checks introduced in 9a6bbee800 (credential: avoid writing values with
newlines, 2020-03-11).

As with the problem fixed in that commit, specially crafted input can
cause the helper to return the credential for the wrong host, letting an
attacker trick the victim into sending credentials for one host to
another.

Luckily, all parts of the credential helper protocol that are available
in a tagged release of Git are immune to this attack:

  - "protocol" is restricted to known values, and is thus immune.

  - "host" is immune because curl will reject hostnames that have a '='
    character in them, which would be required to carry out this attack.

  - "username" is immune, because the buffer characters to fill out the
    first `fgets()` call would pollute the `username` field, causing the
    credential helper to return nothing (because it would match a
    username if present, and the username of the credential to be stolen
    is likely not 1024 characters).

  - "password" is immune because providing a password instructs
    credential helpers to avoid filling credentials in the first place.

  - "path" is similar to username; if present, it is not likely to match
    any credential the victim is storing. It's also not enabled by
    default; the victim would have to set credential.useHTTPPath
    explicitly.

However, the new "wwwauth[]" field introduced via 5f2117b24f
(credential: add WWW-Authenticate header to cred requests, 2023-02-27)
can be used to inject data into the credential helper stream. For
example, running:

    {
      printf 'HTTP/1.1 401\r\n'
      printf 'WWW-Authenticate: basic realm='
      perl -e 'print "a" x 1024'
      printf 'host=victim.com\r\n'
    } | nc -Nlp 8080

in one terminal, and then:

    git clone http://localhost:8080

in another would result in a line like:

    wwwauth[]=basic realm=aaa[...]aaahost=victim.com

being sent to the credential helper. If we tweak that "1024" to align
our output with the helper's buffer size and the rest of the data on the
line, it can cause the helper to see "host=victim.com" on its own line,
allowing motivated attackers to exfiltrate credentials belonging to
"victim.com".

The below test demonstrates these failures and provides us with a test
to ensure that our fix is correct. That said, it has a couple of
shortcomings:

  - it's in t0303, since that's the only mechanism we have for testing
    random helpers. But that means nobody is going to run it under
    normal circumstances.

  - to get the attack right, it has to line up the stuffed name with the
    buffer size, so we depend on the exact buffer size. I parameterized
    it so it could be used to test other helpers, but in practice it's
    not likely for anybody to do that.

Still, it's the best we can do, and will help us confirm the presence of
the problem (and our fixes) in the new few patches.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-01 09:27:01 -07:00
Maxim Cournoyer 3a7a18a045 send-email: detect empty blank lines in command output
The email format does not allow blank lines in headers; detect such
input and report it as malformed and add a test for it.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-01 08:55:52 -07:00
Maxim Cournoyer ba92106e93 send-email: add --header-cmd, --no-header-cmd options
Sometimes, adding a header different than CC or TO is desirable; for
example, when using Debbugs, it is best to use 'X-Debbugs-Cc' headers
to keep people in CC; this is an example use case enabled by the new
'--header-cmd' option.

The header unfolding logic is extracted to a subroutine so that it can
be reused; a test is added for coverage.

Signed-off-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-01 08:55:52 -07:00
Oswald Buddenhagen 8bb19c14fb t/t3501-revert-cherry-pick.sh: clarify scope of the file
The file started out as a test for picks and reverts with renames, but
has been subsequently populated with all kinds of basic tests, in
accordance with its generic name. Adjust the description to reflect
that.

Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-01 08:24:58 -07:00
Junio C Hamano aabc69885e Merge branch 'jk/gpg-trust-level-fix'
The "%GT" placeholder for the "--format" option of "git log" and
friends caused BUG() to trigger on a commit signed with an unknown
key, which has been corrected.

* jk/gpg-trust-level-fix:
  gpg-interface: set trust level of missing key to "undefined"
2023-04-28 16:03:03 -07:00
Junio C Hamano fc23c397c7 Merge branch 'tb/enable-cruft-packs-by-default'
When "gc" needs to retain unreachable objects, packing them into
cruft packs (instead of exploding them into loose object files) has
been offered as a more efficient option for some time.  Now the use
of cruft packs has been made the default and no longer considered
an experimental feature.

* tb/enable-cruft-packs-by-default:
  repository.h: drop unused `gc_cruft_packs`
  builtin/gc.c: make `gc.cruftPacks` enabled by default
  t/t9300-fast-import.sh: prepare for `gc --cruft` by default
  t/t6500-gc.sh: add additional test cases
  t/t6500-gc.sh: refactor cruft pack tests
  t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default
  t/t5304-prune.sh: prepare for `gc --cruft` by default
  builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
  builtin/repack.c: fix incorrect reference to '-C'
  pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-28 16:03:03 -07:00
Junio C Hamano a02675ad90 Merge branch 'ds/fsck-pack-revindex'
"git fsck" learned to validate the on-disk pack reverse index files.

* ds/fsck-pack-revindex:
  fsck: validate .rev file header
  fsck: check rev-index position values
  fsck: check rev-index checksums
  fsck: create scaffolding for rev-index checks
2023-04-27 16:00:59 -07:00
Junio C Hamano 849c8b3dbf Merge branch 'tb/pack-revindex-on-disk'
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.

* tb/pack-revindex-on-disk:
  t: invert `GIT_TEST_WRITE_REV_INDEX`
  config: enable `pack.writeReverseIndex` by default
  pack-revindex: introduce `pack.readReverseIndex`
  pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
  pack-revindex: make `load_pack_revindex` take a repository
  t5325: mark as leak-free
  pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-27 16:00:59 -07:00
Jeff King 089d9adff6 parse_commit(): handle broken whitespace-only timestamp
The comment in parse_commit_date() claims that parse_timestamp() will
not walk past the end of the buffer we've been given, since it will hit
the newline at "eol" and stop. This is usually true, when dateptr
contains actual numbers to parse. But with a line like:

   committer name <email>   \n

with just whitespace, and no numbers, parse_timestamp() will consume
that newline as part of the leading whitespace, and we may walk past our
"tail" pointer (which itself is set from the "size" parameter passed in
to parse_commit_buffer()).

In practice this can't cause us to walk off the end of an array, because
we always add an extra NUL byte to the end of objects we load from disk
(as a defense against exactly this kind of bug). However, you can see
the behavior in action when "committer" is the final header (which it
usually is, unless there's an encoding) and the subject line can be
parsed as an integer. We walk right past the newline on the committer
line, as well as the "\n\n" separator, and mistake the subject for the
timestamp.

We can solve this by trimming the whitespace ourselves, making sure that
it has some non-whitespace to parse. Note that we need to be a bit
careful about the definition of "whitespace" here, as our isspace()
doesn't match exotic characters like vertical tab or formfeed. We can
work around that by checking for an actual number (see the in-code
comment). This is slightly more restrictive than the current code, but
in practice the results are either the same (we reject "foo" as "0", but
so would parse_timestamp()) or extremely unlikely even for broken
commits (parse_timestamp() would allow "\v123" as "123", but we'll now
make it "0").

I did also allow "-" here, which may be controversial, as we don't
currently support negative timestamps. My reasoning was two-fold. One,
the design of parse_timestamp() is such that we should be able to easily
switch it to handling signed values, and this otherwise creates a
hard-to-find gotcha that anybody doing that work would get tripped up
on. And two, the status quo is that we currently parse them, though the
result of course ends up as a very large unsigned value (which is likely
to just get clamped to "0" for display anyway, since our date routines
can't handle it).

The new test checks the commit parser (via "--until") for both vanilla
spaces and the vertical-tab case. I also added a test to check these
against the pretty-print formatter, which uses split_ident_line().  It's
not subject to the same bug, because it already insists that there be
one or more digits in the timestamp.

Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 08:53:53 -07:00
Jeff King ea1615dfdd parse_commit(): parse timestamp from end of line
To find the committer timestamp, we parse left-to-right looking for the
closing ">" of the email, and then expect the timestamp right after
that. But we've seen some broken cases in the wild where this fails, but
we _could_ find the timestamp with a little extra work. E.g.:

  Name <Name<email>> 123456789 -0500

This means that features that rely on the committer timestamp, like
--since or --until, will treat the commit as happening at time 0 (i.e.,
1970).

This is doubly confusing because the pretty-print parser learned to
handle these in 03818a4a94 (split_ident: parse timestamp from end of
line, 2013-10-14). So printing them via "git show", etc, makes
everything look normal, but --until, etc are still broken (despite the
fact that that commit explicitly mentioned --until!).

So let's use the same trick as 03818a4a94: find the end of the line, and
parse back to the final ">". In theory we could use split_ident_line()
here, but it's actually a bit more strict. In particular, it requires a
valid time-zone token, too. That should be present, of course, but we
wouldn't want to break --until for cases that are working currently.

We might want to teach split_ident_line() to become more lenient there,
but it would require checking its many callers (since right now they can
assume that if date_start is non-NULL, so is tz_start).

So for now we'll just reimplement the same trick in the commit parser.

The test is in t4212, which already covers similar cases, courtesy of
03818a4a94. We'll just adjust the broken commit to munge both the author
and committer timestamps. Note that we could match (author|committer)
here, but alternation can't be used portably in sed. Since we wouldn't
expect to see ">" except as part of an ident line, we can just match
that character on any line.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 08:53:35 -07:00
Jeff King 2063b86b81 t4212: avoid putting git on left-hand side of pipe
We wouldn't expect cat-file to fail here, but it's good practice to
avoid putting git on the upstream of a pipe, as we otherwise ignore its
exit code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-27 08:53:32 -07:00
Junio C Hamano 36628c56ed Merge branch 'ps/fix-geom-repack-with-alternates'
Geometric repacking ("git repack --geometric=<n>") in a repository
that borrows from an alternate object database had various corner
case bugs, which have been corrected.

* ps/fix-geom-repack-with-alternates:
  repack: disable writing bitmaps when doing a local repack
  repack: honor `-l` when calculating pack geometry
  t/helper: allow chmtime to print verbosely without modifying mtime
  pack-objects: extend test coverage of `--stdin-packs` with alternates
  pack-objects: fix error when same packfile is included and excluded
  pack-objects: fix error when packing same pack twice
  pack-objects: split out `--stdin-packs` tests into separate file
  repack: fix generating multi-pack-index with only non-local packs
  repack: fix trying to use preferred pack in alternates
  midx: fix segfault with no packs and invalid preferred pack
2023-04-25 13:56:20 -07:00
Junio C Hamano c4c9d5586f Merge branch 'rj/send-email-validate-hook-count-messages'
The sendemail-validate validate hook learned to pass the total
number of input files and where in the sequence each invocation is
via environment variables.

* rj/send-email-validate-hook-count-messages:
  send-email: export patch counters in validate environment
2023-04-25 13:56:20 -07:00
Junio C Hamano 80d268f309 Merge branch 'jk/protocol-cap-parse-fix'
The code to parse capability list for v0 on-wire protocol fell into
an infinite loop when a capability appears multiple times, which
has been corrected.

* jk/protocol-cap-parse-fix:
  v0 protocol: use size_t for capability length/offset
  t5512: test "ls-remote --heads --symref" filtering with v0 and v2
  t5512: allow any protocol version for filtered symref test
  t5512: add v2 support for "ls-remote --symref" test
  v0 protocol: fix sha1/sha256 confusion for capabilities^{}
  t5512: stop referring to "v1" protocol
  v0 protocol: fix infinite loop when parsing multi-valued capabilities
2023-04-25 13:56:20 -07:00
Junio C Hamano 0807e57807 Merge branch 'en/header-split-cache-h'
Header clean-up.

* en/header-split-cache-h: (24 commits)
  protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
  mailmap, quote: move declarations of global vars to correct unit
  treewide: reduce includes of cache.h in other headers
  treewide: remove double forward declaration of read_in_full
  cache.h: remove unnecessary includes
  treewide: remove cache.h inclusion due to pager.h changes
  pager.h: move declarations for pager.c functions from cache.h
  treewide: remove cache.h inclusion due to editor.h changes
  editor: move editor-related functions and declarations into common file
  treewide: remove cache.h inclusion due to object.h changes
  object.h: move some inline functions and defines from cache.h
  treewide: remove cache.h inclusion due to object-file.h changes
  object-file.h: move declarations for object-file.c functions from cache.h
  treewide: remove cache.h inclusion due to git-zlib changes
  git-zlib: move declarations for git-zlib functions from cache.h
  treewide: remove cache.h inclusion due to object-name.h changes
  object-name.h: move declarations for object-name.c functions from cache.h
  treewide: remove unnecessary cache.h inclusion
  treewide: be explicit about dependence on mem-pool.h
  treewide: be explicit about dependence on oid-array.h
  ...
2023-04-25 13:56:20 -07:00
Junio C Hamano 9ce9dea4e1 Sync with Git 2.40.1 2023-04-24 22:31:32 -07:00
Taylor Blau a2742f8c59 t/helper/test-json-writer.c: avoid using strtok()
Apply similar treatment as in the previous commit to remove usage of
`strtok()` from the "oidmap" test helper.

Each of the different commands that the "json-writer" helper accepts
pops the next space-delimited token from the current line and interprets
it as a string, integer, or double (with the exception of the very first
token, which is the command itself).

To accommodate this, split the line in place by the space character, and
pass the corresponding string_list to each of the specialized `get_s()`,
`get_i()`, and `get_d()` functions.

`get_i()` and `get_d()` are thin wrappers around `get_s()` that convert
their result into the appropriate type by either calling `strtol()` or
`strtod()`, respectively. In `get_s()`, we mark the token as "consumed"
by incrementing the `consumed_nr` counter, indicating how many tokens we
have read up to that point.

Because each of these functions needs the string-list parts, the number
of tokens consumed, and the line number, these three are wrapped up in
to a struct representing the line state.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 16:01:28 -07:00
Taylor Blau deeabc1ff0 t/helper/test-oidmap.c: avoid using strtok()
Apply similar treatment as in the previous commit to remove usage of
`strtok()` from the "oidmap" test helper.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 16:01:28 -07:00
Taylor Blau 826f0e33ab t/helper/test-hashmap.c: avoid using strtok()
Avoid using the non-reentrant `strtok()` to separate the parts of each
incoming command. Instead of replacing it with `strtok_r()`, let's
instead use the more friendly pair of `string_list_split_in_place()` and
`string_list_remove_empty_items()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 16:01:28 -07:00
Taylor Blau 52acddf36c string-list: multi-delimiter string_list_split_in_place()
Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.

Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.

When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.

This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:

    string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1)

would yield a string list of:

    ["foo", "", "", "bar", "", "", "baz"]

Callers that wish to emulate the behavior of strtok(2) more directly
should call `string_list_remove_empty_items()` after splitting.

To avoid regressions for the new multi-character delimter cases, update
t0063 in this patch as well.

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 16:01:28 -07:00
Jacob Keller 603d0fdce2 blame: use different author name for fake commit generated by --contents
When the --contents option is used with git blame, and the contents of
the file have lines which can't be annotated by the history being
blamed, the user will see an author of "Not Committed Yet". This is
similar to the way blame handles working tree contents when blaming
without a revision.

This is slightly confusing since this data isn't the working copy and
while it is technically "not committed yet", its also coming from an
external file. Replace this author name with "External file
(--contents)" to better differentiate such lines from actual working
copy lines.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Glen Choo <chooglen@google.com>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 15:16:31 -07:00
Andrei Rybak 3d77fbb664 t1300: add tests for missing keys
There are several tests in t1300-config.sh that validate failing
invocations of "git config".  However, there are no tests that check
what happens when "git config" is asked to retrieve a value for a
missing key.

Add tests that check this for various combinations of "<section>.<key>"
and "<section>.<subsection>.<key>".

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 15:10:50 -07:00
Andrei Rybak 93f86046c9 t1300: check stderr for "ignores pairs" tests
Tests "git config ignores pairs ..." in t1300-config.sh validate that
"git config" ignores various kinds of supplied pairs of environment
variables GIT_CONFIG_KEY_* GIT_CONFIG_VALUE_* depending on
GIT_CONFIG_COUNT.  By "ignores" here we mean that "git config" abides by
the value of environment variable GIT_CONFIG_COUNT and doesn't use
key-value pairs outside of the supplied GIT_CONFIG_COUNT when trying to
produce a value for config key "pair.one".

These tests also validate that "git config" doesn't complain about
mismatched environment variables to standard error.  This is validated
by redirecting the standard error to a file called "error" and asserting
that it is empty.  However, two of these tests incorrectly redirect to
standard output while calling the file "error", and test 'git config
ignores pairs exceeding count' doesn't validate standard error at all.

Fix these tests by redirecting standard error to file "error" and
asserting its emptiness.

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 15:10:50 -07:00
Andrei Rybak f7f9a836e2 t1300: drop duplicate test
There are two almost identical tests called 'git config ignores pairs
with zero count' in file t1300-config.sh.  Drop the first of these and
keep the one that contains more assertions.

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 15:10:50 -07:00
Elijah Newren e3a3f5edf5 reftable: ensure git-compat-util.h is the first (indirect) include
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:33 -07:00
Elijah Newren 5e3f94dfe3 treewide: remove cache.h inclusion due to previous changes
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:33 -07:00
Elijah Newren d1cbe1e6d8 hash-ll.h: split out of hash.h to remove dependency on repository.h
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo).  However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id.  Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.

This allows hash.h and object.h to be fairly small, minimal headers.  It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:32 -07:00
Elijah Newren d4ff2072ab match-trees.h: move declarations for match-trees.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:32 -07:00
Elijah Newren 0ff73d742b packfile.h: move pack_window and pack_entry from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:31 -07:00
Elijah Newren 69a63fe663 treewide: be explicit about dependence on strbuf.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:31 -07:00
Junio C Hamano b64894c206 Merge branch 'ow/ref-filter-omit-empty'
"git branch --format=..." and "git format-patch --format=..."
learns "--omit-empty" to hide refs that whose formatting result
becomes an empty string from the output.

* ow/ref-filter-omit-empty:
  branch, for-each-ref, tag: add option to omit empty lines
2023-04-21 15:35:05 -07:00
Junio C Hamano 9e0d1aa495 Merge branch 'ah/format-patch-thread-doc'
Doc update.

* ah/format-patch-thread-doc:
  format-patch: correct documentation of --thread without an argument
2023-04-21 15:35:05 -07:00
Junio C Hamano 7ac228c994 Merge branch 'rn/sparse-describe'
"git describe --dirty" learns to work better with sparse-index.

* rn/sparse-describe:
  describe: enable sparse index for describe
2023-04-21 15:35:04 -07:00
Junio C Hamano de73a20756 Merge branch 'rs/archive-from-subdirectory-fixes'
"git archive" run from a subdirectory mishandled attributes and
paths outside the current directory.

* rs/archive-from-subdirectory-fixes:
  archive: improve support for running in subdirectory
2023-04-21 15:35:04 -07:00
M Hickford a5c76569e7 credential: new attribute oauth_refresh_token
Git authentication with OAuth access token is supported by every popular
Git host including GitHub, GitLab and BitBucket [1][2][3]. Credential
helpers Git Credential Manager (GCM) and git-credential-oauth generate
OAuth credentials [4][5]. Following RFC 6749, the application prints a
link for the user to authorize access in browser. A loopback redirect
communicates the response including access token to the application.

For security, RFC 6749 recommends that OAuth response also includes
expiry date and refresh token [6]. After expiry, applications can use
the refresh token to generate a new access token without user
reauthorization in browser. GitLab and BitBucket set the expiry at two
hours [2][3]. (GitHub doesn't populate expiry or refresh token.)

However the Git credential protocol has no attribute to store the OAuth
refresh token (unrecognised attributes are silently discarded). This
means that the user has to regularly reauthorize the helper in browser.
On a browserless system, this is particularly intrusive, requiring a
second device.

Introduce a new attribute oauth_refresh_token. This is especially
useful when a storage helper and a read-only OAuth helper are configured
together. Recall that `credential fill` calls each helper until it has a
non-expired password.

```
[credential]
	helper = storage  # eg. cache or osxkeychain
	helper = oauth
```

The OAuth helper can use the stored refresh token forwarded by
`credential fill` to generate a fresh access token without opening the
browser. See
https://github.com/hickford/git-credential-oauth/pull/3/files
for an implementation tested with this patch.

Add support for the new attribute to credential-cache. Eventually, I
hope to see support in other popular storage helpers.

Alternatives considered: ask helpers to store all unrecognised
attributes. This seems excessively complex for no obvious gain.
Helpers would also need extra information to distinguish between
confidential and non-confidential attributes.

Workarounds: GCM abuses the helper get/store/erase contract to store the
refresh token during credential *get* as the password for a fictitious
host [7] (I wrote this hack). This workaround is only feasible for a
monolithic helper with its own storage.

[1] https://github.blog/2012-09-21-easier-builds-and-deployments-using-git-over-https-and-oauth/
[2] https://docs.gitlab.com/ee/api/oauth2.html#access-git-over-https-with-access-token
[3] https://support.atlassian.com/bitbucket-cloud/docs/use-oauth-on-bitbucket-cloud/#Cloning-a-repository-with-an-access-token
[4] https://github.com/GitCredentialManager/git-credential-manager
[5] https://github.com/hickford/git-credential-oauth
[6] https://datatracker.ietf.org/doc/html/rfc6749#section-5.1
[7] 66b94e489a/src/shared/GitLab/GitLabHostProvider.cs (L207)

Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-21 09:38:30 -07:00
Junio C Hamano a4a4db8cf7 Merge branch 'gc/better-error-when-local-clone-fails-with-symlink'
"git clone --local" stops copying from an original repository that
has symbolic links inside its $GIT_DIR; an error message when that
happens has been updated.

* gc/better-error-when-local-clone-fails-with-symlink:
  clone: error specifically with --local and symlinked objects
2023-04-20 14:33:36 -07:00
Junio C Hamano 98c496fcd0 Merge branch 'ar/t2024-checkout-output-fix'
Test fix.

* ar/t2024-checkout-output-fix:
  t2024: fix loose/strict local base branch DWIM test
2023-04-20 14:33:36 -07:00
Junio C Hamano fa9172c70a Merge branch 'rs/remove-approxidate-relative'
The approxidate() API has been simplified by losing an extra
function that did the same thing as another one.

* rs/remove-approxidate-relative:
  date: remove approxidate_relative()
2023-04-20 14:33:35 -07:00
Junio C Hamano cbfe844aa1 Merge branch 'rs/userdiff-multibyte-regex'
The userdiff regexp patterns for various filetypes that are built
into the system have been updated to avoid triggering regexp errors
from UTF-8 aware regex engines.

* rs/userdiff-multibyte-regex:
  userdiff: support regexec(3) with multi-byte support
2023-04-20 14:33:35 -07:00
Michael Strawbridge a8022c5f7b send-email: expose header information to git-send-email's sendemail-validate hook
To allow further flexibility in the Git hook, the SMTP header
information of the email which git-send-email intends to send, is now
passed as the 2nd argument to the sendemail-validate hook.

As an example, this can be useful for acting upon keywords in the
subject or specific email addresses.

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Michael Strawbridge <michael.strawbridge@amd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-19 14:19:09 -07:00
Jeff King 7891e46585 gpg-interface: set trust level of missing key to "undefined"
In check_signature(), we initialize the trust_level field to "-1", with
the idea that if gpg does not return a trust level at all (if there is
no signature, or if the signature is made by an unknown key), we'll
use that value. But this has two problems:

  1. Since the field is an enum, it's up to the compiler to decide what
     underlying storage to use, and it only has to fit the values we've
     declared. So we may not be able to store "-1" at all. And indeed,
     on my system (linux with gcc), the resulting enum is an unsigned
     32-bit value, and -1 becomes 4294967295.

     The difference may seem academic (and you even get "-1" if you pass
     it to printf("%d")), but it means that code like this:

       status |= sigc->trust_level < configured_min_trust_level;

     does not necessarily behave as expected. This turns out not to be a
     bug in practice, though, because we keep the "-1" only when gpg did
     not report a signature from a known key, in which case the line
     above:

       status |= sigc->result != 'G';

     would always set status to non-zero anyway. So only a 'G' signature
     with no parsed trust level would cause a problem, which doesn't
     seem likely to trigger (outside of unexpected gpg behavior).

  2. When using the "%GT" format placeholder, we pass the value to
     gpg_trust_level_to_str(), which complains that the value is out of
     range with a BUG(). This behavior was introduced by 803978da49
     (gpg-interface: add function for converting trust level to string,
     2022-07-11). Before that, we just did a switch() on the enum, and
     anything that wasn't matched would end up as the empty string.

     Curiously, solving this by naively doing:

       if (level < 0)
               return "";

     in that function isn't sufficient. Because of (1) above, the
     compiler can (and does in my case) actually remove that conditional
     as dead code!

We can solve both by representing this state as an enum value. We could
do this by adding a new "unknown" value. But this really seems to match
the existing "undefined" level well. GPG describes this as "Not enough
information for calculation".

We have tests in t7510 that trigger this case (verifying a signature
from a key that we don't have, and then checking various %G
placeholders), but they didn't notice the BUG() because we didn't look
at %GT for that case! Let's make sure we check all %G placeholders for
each case in the formatting tests.

The interesting ones here are "show unknown signature with custom
format" and "show lack of signature with custom format", both of which
would BUG() before, and now turn %GT into "undefined". Prior to
803978da49 they would have turned it into the empty string, but I think
saying "undefined" consistently is a reasonable outcome, and probably
makes life easier for anyone parsing the output (and any such parser had
to be ready to see "undefined" already).

The other modified tests produce the same output before and after this
patch, but now we're consistently checking both %G? and %GT in all of
them.

Signed-off-by: Jeff King <peff@peff.net>
Reported-by: Rolf Eike Beer <eb@emlix.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-19 08:30:54 -07:00
Taylor Blau e3e24de1bf builtin/gc.c: make gc.cruftPacks enabled by default
Back in 5b92477f89 (builtin/gc.c: conditionally avoid pruning objects
via loose, 2022-05-20), `git gc` learned the `--cruft` option and
`gc.cruftPacks` configuration to opt-in to writing cruft packs when
collecting or pruning unreachable objects.

Cruft packs were introduced with the merge in a50036da1a (Merge branch
'tb/cruft-packs', 2022-06-03). They address the problem of "loose object
explosions", where Git will write out many individual loose objects when
there is a large number of unreachable objects that have not yet aged
past `--prune=<date>`.

Instead of keeping track of those unreachable yet recent objects via
their loose object file's mtime, cruft packs collect all unreachable
objects into a single pack with a corresponding `*.mtimes` file that
acts as a table to store the mtimes of all unreachable objects. This
prevents the need to store unreachable objects as loose as they age out
of the repository, and avoids the problem of loose object explosions.

Beyond avoiding loose object explosions, cruft packs also act as a more
efficient mechanism to store unreachable objects as they age out of a
repository. This is because pairs of similar unreachable objects serve
as delta bases for one another.

In 5b92477f89, the feature was introduced as experimental. Since then,
GitHub has been running these patches in every repository generating
hundreds of millions of cruft packs along the way. The feature is
battle-tested, and avoids many pathological cases such as above. Users
who either run `git gc` manually, or via `git maintenance` can benefit
from having cruft packs.

As such, enable cruft pack generation to take place by default (by
making `gc.cruftPacks` have the default of "true" rather than "false).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:48 -07:00
Taylor Blau c58100ab5d t/t9300-fast-import.sh: prepare for gc --cruft by default
In a similar fashion as previous commits, adjust the fast-import tests
to prepare for "git gc" generating a cruft pack by default.

This adjustment is slightly different, however. Instead of relying on us
writing out the objects loose, and then calling `git prune` to remove
them, t9300 needs to be prepared to drop objects that would be moved
into cruft packs.

To do this, we can combine the `git gc` invocation with `git prune` into
one `git gc --prune`, which handles pruning both loose objects, and
objects that would otherwise be written to a cruft pack.

Likely this pattern of "git gc && git prune" started all the way back in
03db4525d3 (Support gitlinks in fast-import., 2008-07-19), which
happened after deprecating `git gc --prune` in 9e7d501990 (builtin-gc.c:
deprecate --prune, it now really has no effect, 2008-05-09).

After `--prune` was un-deprecated in 58e9d9d472 (gc: make --prune useful
again by accepting an optional parameter, 2009-02-14), this script got a
handful of new "git gc && git prune" instances via via 4cedb78cb5
(fast-import: add input format tests, 2011-08-11). These could have been
`git gc --prune`, but weren't (likely taking after 03db4525d3).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:48 -07:00
Taylor Blau b9061bc628 t/t6500-gc.sh: add additional test cases
In the last commit, we refactored some of the tests in t6500 to make
clearer when cruft packs will and won't be generated by `git gc`.

Add the remaining cases not covered by the previous patch into this one,
which enumerates all possible combinations of arguments that will
produce (or not produce) a cruft pack.

This prepares us for a future commit which will change the default value
of `gc.cruftPacks` by ensuring that we understand which invocations do
and do not change as a result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:48 -07:00
Taylor Blau 50685e0e0b t/t6500-gc.sh: refactor cruft pack tests
In 12253ab6d0 (gc: add tests for --cruft and friends, 2022-10-26), we
added a handful of tests to t6500 to ensure that `git gc` respected the
value of `--cruft` and `gc.cruftPacks`.

Then, in c695592850 (config: let feature.experimental imply
gc.cruftPacks=true, 2022-10-26), another set of similar tests was added
to ensure that `feature.experimental` correctly implied enabling cruft
pack generation (or not).

These tests are similar and could be consolidated. Do so in this patch
to prepare for expanding the set of command-line invocations that enable
or disable writing cruft packs. This makes it possible to easily test
more combinations of arguments without being overly repetitive.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:48 -07:00
Taylor Blau b31d45b831 t/t6501-freshen-objects.sh: prepare for gc --cruft by default
In a similar spirit as previous commits, prepare for `gc --cruft`
becoming the default by ensuring that the tests in t6501 explicitly
cover the case of freshening loose objects not using cruft packs.

We could run this test twice, once with `--cruft` and once with
`--no-cruft`, but doing so is unnecessary, since we already test object
rescuing, freshening, and dealing with corrupt parts of the unreachable
object graph extensively via t5329.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:47 -07:00
Taylor Blau b934207a22 t/t5304-prune.sh: prepare for gc --cruft by default
Many of the tests in t5304 run `git gc`, and rely on its behavior that
unreachable-but-recent objects are written out loose. This is sensible,
since t5304 deals specifically with this kind of pruning.

If left unattended, however, this test would break when the default
behavior of a bare "git gc" is adjusted to generate a cruft pack by
default.

Ensure that these tests continue to work as-is (and continue to provide
coverage of loose object pruning) by passing `--no-cruft` explicitly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:47 -07:00
Taylor Blau 05b9013b71 builtin/gc.c: ignore cruft packs with --keep-largest-pack
When cruft packs were implemented, we never adjusted the code for `git
gc`'s `--keep-largest-pack` and `gc.bigPackThreshold` to ignore cruft
packs. This option and configuration option share a common
implementation, but including cruft packs is wrong in both cases:

  - Running `git gc --keep-largest-pack` in a repository where the
    largest pack is the cruft pack itself will make it impossible for
    `git gc` to prune objects, since the cruft pack itself is kept.

  - The same is true for `gc.bigPackThreshold`, if the size of the cruft
    pack exceeds the limit set by the caller.

In the future, it is possible that `gc.bigPackThreshold` could be used
to write a separate cruft pack containing any new unreachable objects
that entered the repository since the last time a cruft pack was
written.

There are some complexities to doing so, mainly around handling
pruning objects that are in an existing cruft pack that is above the
threshold (which would either need to be rewritten, or else delay
pruning). Rewriting a substantially similar cruft pack isn't ideal, but
it is significantly better than the status-quo.

If users have large cruft packs that they don't want to rewrite, they
can mark them as `*.keep` packs. But in general, if a repository has a
cruft pack that is so large it is slowing down GC's, it should probably
be pruned anyway.

In the meantime, ignore cruft packs in the common implementation for
both of these options, and add a pair of tests to prevent any future
regressions here.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:47 -07:00
Junio C Hamano 3c957e6d39 Merge branch 'pw/rebase-cleanup-merge-strategy-option-handling'
Clean-up of the code path that deals with merge strategy option
handling in "git rebase".

* pw/rebase-cleanup-merge-strategy-option-handling:
  rebase: remove a couple of redundant strategy tests
  rebase -m: fix serialization of strategy options
  rebase -m: cleanup --strategy-option handling
  sequencer: use struct strvec to store merge strategy options
  rebase: stop reading and writing unnecessary strategy state
2023-04-17 18:05:13 -07:00
Junio C Hamano 9d8370d445 Merge branch 'tk/mergetool-gui-default-config'
"git mergetool" and "git difftool" learns a new configuration
guiDefault to optionally favor configured guitool over non-gui-tool
automatically when $DISPLAY is set.

* tk/mergetool-gui-default-config:
  mergetool: new config guiDefault supports auto-toggling gui by DISPLAY
2023-04-17 18:05:11 -07:00
Junio C Hamano d47ee0a565 Merge branch 'sl/sparse-write-tree'
"git write-tree" learns to work better with sparse-index.

* sl/sparse-write-tree:
  write-tree: integrate with sparse index
2023-04-17 18:05:11 -07:00
Derrick Stolee 5a6072f631 fsck: validate .rev file header
While parsing a .rev file, we check the header information to be sure it
makes sense. This happens before doing any additional validation such as
a checksum or value check. In order to differentiate between a bad
header and a non-existent file, we need to update the API for loading a
reverse index.

Make load_pack_revindex_from_disk() non-static and specify that a
positive value means "the file does not exist" while other errors during
parsing are negative values. Since an invalid header prevents setting up
the structures we would use for further validations, we can stop at that
point.

The place where we can distinguish between a missing file and a corrupt
file is inside load_revindex_from_disk(), which is used both by pack
rev-indexes and multi-pack-index rev-indexes. Some tests in t5326
demonstrate that it is critical to take some conditions to allow
positive error signals.

Add tests that check the three header values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-17 14:39:05 -07:00
Derrick Stolee 5f658d1b57 fsck: check rev-index position values
When checking a rev-index file, it may be helpful to identify exactly
which positions are incorrect. Compare the rev-index to a
freshly-computed in-memory rev-index and report the comparison failures.

This additional check (on top of the checksum validation) can help find
files that were corrupt by a single bit flip on-disk or perhaps were
written incorrectly due to a bug in Git.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-17 14:39:04 -07:00
Derrick Stolee d975fe1fa5 fsck: check rev-index checksums
The previous change added calls to verify_pack_revindex() in
builtin/fsck.c, but the implementation of the method was left empty. Add
the first and most-obvious check to this method: checksum verification.

While here, create a helper method in the test script that makes it easy
to adjust the .rev file and check that 'git fsck' reports the correct
error message.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-17 14:39:04 -07:00
Derrick Stolee 0d30feef3c fsck: create scaffolding for rev-index checks
The 'fsck' builtin checks many of Git's on-disk data structures, but
does not currently validate the pack rev-index files (a .rev file to
pair with a .pack and .idx file).

Before doing a more-involved check process, create the scaffolding
within builtin/fsck.c to have a new error type and add that error type
when the API method verify_pack_revindex() returns an error. That method
does nothing currently, but we will add checks to it in later changes.

For now, check that 'git fsck' succeeds without any errors in the normal
case. Future checks will be paired with tests that corrupt the .rev file
appropriately.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-17 14:39:04 -07:00
Johannes Schindelin 3d3c11852c Sync with 2.39.3
* maint-2.39: (34 commits)
  Git 2.39.3
  Git 2.38.5
  Git 2.37.7
  Git 2.36.6
  Git 2.35.8
  Makefile: force -O0 when compiling with SANITIZE=leak
  Git 2.34.8
  Git 2.33.8
  Git 2.32.7
  Git 2.31.8
  tests: avoid using `test_i18ncmp`
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  range-diff: use ssize_t for parsed "len" in read_patches()
  ...
2023-04-17 21:16:10 +02:00
Johannes Schindelin 15628975cf Sync with 2.38.5
* maint-2.38: (32 commits)
  Git 2.38.5
  Git 2.37.7
  Git 2.36.6
  Git 2.35.8
  Git 2.34.8
  Git 2.33.8
  Git 2.32.7
  Git 2.31.8
  tests: avoid using `test_i18ncmp`
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  range-diff: use ssize_t for parsed "len" in read_patches()
  range-diff: handle unterminated lines in read_patches()
  range-diff: drop useless "offset" variable from read_patches()
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  t0003: GETTEXT_POISON fix, conclusion
  ...
2023-04-17 21:16:08 +02:00
Johannes Schindelin c96ecfe6a5 Sync with 2.37.7
* maint-2.37: (31 commits)
  Git 2.37.7
  Git 2.36.6
  Git 2.35.8
  Git 2.34.8
  Git 2.33.8
  Git 2.32.7
  Git 2.31.8
  tests: avoid using `test_i18ncmp`
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  range-diff: use ssize_t for parsed "len" in read_patches()
  range-diff: handle unterminated lines in read_patches()
  range-diff: drop useless "offset" variable from read_patches()
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  t0003: GETTEXT_POISON fix, conclusion
  t0003: GETTEXT_POISON fix, part 1
  ...
2023-04-17 21:16:06 +02:00
Johannes Schindelin 1df551ce5c Sync with 2.36.6
* maint-2.36: (30 commits)
  Git 2.36.6
  Git 2.35.8
  Git 2.34.8
  Git 2.33.8
  Git 2.32.7
  Git 2.31.8
  tests: avoid using `test_i18ncmp`
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  range-diff: use ssize_t for parsed "len" in read_patches()
  range-diff: handle unterminated lines in read_patches()
  range-diff: drop useless "offset" variable from read_patches()
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  t0003: GETTEXT_POISON fix, conclusion
  t0003: GETTEXT_POISON fix, part 1
  t0033: GETTEXT_POISON fix
  ...
2023-04-17 21:16:04 +02:00
Johannes Schindelin 62298def14 Sync with 2.35.8
* maint-2.35: (29 commits)
  Git 2.35.8
  Git 2.34.8
  Git 2.33.8
  Git 2.32.7
  Git 2.31.8
  tests: avoid using `test_i18ncmp`
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  range-diff: use ssize_t for parsed "len" in read_patches()
  range-diff: handle unterminated lines in read_patches()
  range-diff: drop useless "offset" variable from read_patches()
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  t0003: GETTEXT_POISON fix, conclusion
  t0003: GETTEXT_POISON fix, part 1
  t0033: GETTEXT_POISON fix
  http: support CURLOPT_PROTOCOLS_STR
  ...
2023-04-17 21:16:02 +02:00
Johannes Schindelin 8cd052ea53 Sync with 2.34.8
* maint-2.34: (28 commits)
  Git 2.34.8
  Git 2.33.8
  Git 2.32.7
  Git 2.31.8
  tests: avoid using `test_i18ncmp`
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  range-diff: use ssize_t for parsed "len" in read_patches()
  range-diff: handle unterminated lines in read_patches()
  range-diff: drop useless "offset" variable from read_patches()
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  t0003: GETTEXT_POISON fix, conclusion
  t0003: GETTEXT_POISON fix, part 1
  t0033: GETTEXT_POISON fix
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  ...
2023-04-17 21:15:59 +02:00
Johannes Schindelin d6e9f67a8e Sync with 2.33.8
* maint-2.33: (27 commits)
  Git 2.33.8
  Git 2.32.7
  Git 2.31.8
  tests: avoid using `test_i18ncmp`
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  range-diff: use ssize_t for parsed "len" in read_patches()
  range-diff: handle unterminated lines in read_patches()
  range-diff: drop useless "offset" variable from read_patches()
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  t0003: GETTEXT_POISON fix, conclusion
  t0003: GETTEXT_POISON fix, part 1
  t0033: GETTEXT_POISON fix
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  ...
2023-04-17 21:15:56 +02:00
Johannes Schindelin bcd874d50f Sync with 2.32.7
* maint-2.32: (26 commits)
  Git 2.32.7
  Git 2.31.8
  tests: avoid using `test_i18ncmp`
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  range-diff: use ssize_t for parsed "len" in read_patches()
  range-diff: handle unterminated lines in read_patches()
  range-diff: drop useless "offset" variable from read_patches()
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  t0003: GETTEXT_POISON fix, conclusion
  t0003: GETTEXT_POISON fix, part 1
  t0033: GETTEXT_POISON fix
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  ci: install python on ubuntu
  ...
2023-04-17 21:15:52 +02:00
Johannes Schindelin 31f7fe5e34 Sync with 2.31.8
* maint-2.31: (25 commits)
  Git 2.31.8
  tests: avoid using `test_i18ncmp`
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  range-diff: use ssize_t for parsed "len" in read_patches()
  range-diff: handle unterminated lines in read_patches()
  range-diff: drop useless "offset" variable from read_patches()
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  t0003: GETTEXT_POISON fix, conclusion
  t0003: GETTEXT_POISON fix, part 1
  t0033: GETTEXT_POISON fix
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  ci: install python on ubuntu
  ci: use the same version of p4 on both Linux and macOS
  ...
2023-04-17 21:15:49 +02:00
Johannes Schindelin 92957d8427 tests: avoid using test_i18ncmp
Since `test_i18ncmp` was deprecated in v2.31.*, the instances added in
v2.30.9 needed to be converted to `test_cmp` calls.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2023-04-17 21:15:45 +02:00
Johannes Schindelin b524e896b6 Sync with 2.30.9
* maint-2.30: (23 commits)
  Git 2.30.9
  gettext: avoid using gettext if the locale dir is not present
  apply --reject: overwrite existing `.rej` symlink if it exists
  http.c: clear the 'finished' member once we are done with it
  clone.c: avoid "exceeds maximum object size" error with GCC v12.x
  range-diff: use ssize_t for parsed "len" in read_patches()
  range-diff: handle unterminated lines in read_patches()
  range-diff: drop useless "offset" variable from read_patches()
  t5604: GETTEXT_POISON fix, conclusion
  t5604: GETTEXT_POISON fix, part 1
  t5619: GETTEXT_POISON fix
  t0003: GETTEXT_POISON fix, conclusion
  t0003: GETTEXT_POISON fix, part 1
  t0033: GETTEXT_POISON fix
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  ci: install python on ubuntu
  ci: use the same version of p4 on both Linux and macOS
  ci: remove the pipe after "p4 -V" to catch errors
  github-actions: run gcc-8 on ubuntu-20.04 image
  ...
2023-04-17 21:15:44 +02:00
Taylor Blau 528290f8c6 Merge branch 'tb/config-copy-or-rename-in-file-injection'
Avoids issues with renaming or deleting sections with long lines, where
configuration values may be interpreted as sections, leading to
configuration injection. Addresses CVE-2023-29007.

* tb/config-copy-or-rename-in-file-injection:
  config.c: disallow overly-long lines in `copy_or_rename_section_in_file()`
  config.c: avoid integer truncation in `copy_or_rename_section_in_file()`
  config: avoid fixed-sized buffer when renaming/deleting a section
  t1300: demonstrate failure when renaming sections with long lines

Signed-off-by: Taylor Blau <me@ttaylorr.com>
2023-04-17 21:15:42 +02:00
Taylor Blau 3bb3d6bac5 config.c: disallow overly-long lines in copy_or_rename_section_in_file()
As a defense-in-depth measure to guard against any potentially-unknown
buffer overflows in `copy_or_rename_section_in_file()`, refuse to work
with overly-long lines in a gitconfig.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
2023-04-17 21:15:40 +02:00
Taylor Blau a5bb10fd5e config: avoid fixed-sized buffer when renaming/deleting a section
When renaming (or deleting) a section of configuration, Git uses the
function `git_config_copy_or_rename_section_in_file()` to rewrite the
configuration file after applying the rename or deletion to the given
section.

To do this, Git repeatedly calls `fgets()` to read the existing
configuration data into a fixed size buffer.

When the configuration value under `old_name` exceeds the size of the
buffer, we will call `fgets()` an additional time even if there is no
newline in the configuration file, since our read length is capped at
`sizeof(buf)`.

If the first character of the buffer (after zero or more characters
satisfying `isspace()`) is a '[', Git will incorrectly treat it as
beginning a new section when the original section is being removed. In
other words, a configuration value satisfying this criteria can
incorrectly be considered as a new secftion instead of a variable in the
original section.

Avoid this issue by using a variable-width buffer in the form of a
strbuf rather than a fixed-with region on the stack. A couple of small
points worth noting:

  - Using a strbuf will cause us to allocate arbitrary sizes to match
    the length of each line.  In practice, we don't expect any
    reasonable configuration files to have lines that long, and a
    bandaid will be introduced in a later patch to ensure that this is
    the case.

  - We are using strbuf_getwholeline() here instead of strbuf_getline()
    in order to match `fgets()`'s behavior of leaving the trailing LF
    character on the buffer (as well as a trailing NUL).

    This could be changed later, but using strbuf_getwholeline() changes
    the least about this function's implementation, so it is picked as
    the safest path.

  - It is temping to want to replace the loop to skip over characters
    matching isspace() at the beginning of the buffer with a convenience
    function like `strbuf_ltrim()`. But this is the wrong approach for a
    couple of reasons:

    First, it involves a potentially large and expensive `memmove()`
    which we would like to avoid. Second, and more importantly, we also
    *do* want to preserve those spaces to avoid changing the output of
    other sections.

In all, this patch is a minimal replacement of the fixed-width buffer in
`git_config_copy_or_rename_section_in_file()` to instead use a `struct
strbuf`.

Reported-by: André Baptista <andre@ethiack.com>
Reported-by: Vítor Pinho <vitor@ethiack.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Co-authored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2023-04-17 21:15:40 +02:00
Taylor Blau 29198213c9 t1300: demonstrate failure when renaming sections with long lines
When renaming a configuration section which has an entry whose length
exceeds the size of our buffer in config.c's implementation of
`git_config_copy_or_rename_section_in_file()`, Git will incorrectly
form a new configuration section with part of the data in the section
being removed.

In this instance, our first configuration file looks something like:

    [b]
      c = d <spaces> [a] e = f
    [a]
      g = h

Here, we have two configuration values, "b.c", and "a.g". The value "[a]
e = f" belongs to the configuration value "b.c", and does not form its
own section.

However, when renaming the section 'a' to 'xyz', Git will write back
"[xyz]\ne = f", but "[xyz]" is still attached to the value of "b.c",
which is why "e = f" on its own line becomes a new entry called "b.e".

A slightly different example embeds the section being renamed within
another section.

Demonstrate this failure in a test in t1300, which we will fix in the
following commit.

Co-authored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2023-04-17 21:15:39 +02:00
Johannes Schindelin 9db05711c9 apply --reject: overwrite existing .rej symlink if it exists
The `git apply --reject` is expected to write out `.rej` files in case
one or more hunks fail to apply cleanly. Historically, the command
overwrites any existing `.rej` files. The idea being that
apply/reject/edit cycles are relatively common, and the generated `.rej`
files are not considered precious.

But the command does not overwrite existing `.rej` symbolic links, and
instead follows them. This is unsafe because the same patch could
potentially create such a symbolic link and point at arbitrary paths
outside the current worktree, and `git apply` would write the contents
of the `.rej` file into that location.

Therefore, let's make sure that any existing `.rej` file or symbolic
link is removed before writing it.

Reported-by: RyotaK <ryotak.mail@gmail.com>
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2023-04-17 21:15:38 +02:00
Jeff King c4716236f2 t5512: test "ls-remote --heads --symref" filtering with v0 and v2
We have two overlapping tests for checking the behavior of "ls-remote
--symref" when filtering output. The first test checks that using
"--heads" will omit the symref for HEAD (since we don't print anything
about HEAD at all), but still prints other symrefs.

This has been marked as expecting failure since it was added in
99c08d4eb2 (ls-remote: add support for showing symrefs, 2016-01-19).
That's because back then, we only had the v0 protocol, and it only
reported on the HEAD symref, not others. But these days we have v2,
which does exactly what the test wants. It would even have started
unexpectedly passing when we switched to v2 by default, except that
b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
2019-02-25) over-zealously marked it to run only in v0 mode.

So let's run it with both protocol versions, and adjust the expected
output for each. It passes in v2 without modification. In v0 mode, we'll
drop the extra symref, but this is still testing something useful: it
ensures that we do omit HEAD.

The test after this checks "--heads" again, this time using the expected
v0 output. That's now redundant. It also checks that limiting with a
pattern like "refs/heads/*" works similarly, but that's redundant with a
test earlier in the script which limits by HEAD (again, back then the
"HEAD" test was less interesting because there were no other symrefs to
omit, but in a modern v2 world, there are). So we can just delete that
second test entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 15:08:13 -07:00
Jeff King d6747adfa8 t5512: allow any protocol version for filtered symref test
We have a test that checks that ls-remote, when asked only about HEAD,
will report the HEAD symref, and not others. This was marked to always
run with the v0 protocol by b2f73b70b2 (t5512: compensate for v0 only
sending HEAD symrefs, 2019-02-25).

But in v0 this test is doing nothing! For v0, upload-pack only reports
the HEAD symref anyway, so we'd never have any other symref to report.

For v2, it is useful; we learn about all symrefs (and the test repo has
multiple), so this demonstrates that we correctly avoid showing them.

We could perhaps mark this to test explicitly with v2, but since that is
the default these days, it's sufficient to just run ls-remote without
any protocol specification. It still passes if somebody does an explicit
GIT_TEST_PROTOCOL_VERSION=0; it's just testing less.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 15:08:12 -07:00
Jeff King 20272ee8cf t5512: add v2 support for "ls-remote --symref" test
Commit b2f73b70b2 (t5512: compensate for v0 only sending HEAD symrefs,
2019-02-25) configured this test to always run with protocol v0, since
the output is different for v2.

But that means we are not getting any test coverage of the feature with
v2 at all. We could obviously switch to using and expecting v2, but then
that leaves v0 behind (and while we don't use it by default, it's still
important for testing interoperability with older servers). Likewise, we
could switch the expected output based on $GIT_TEST_PROTOCOL_VERSION,
but hardly anybody runs the tests for v0 these days.

Instead, let's explicitly run it for both protocol versions to make sure
they're well behaved. This matches other similar tests added later in
6a139cdd74 (ls-remote: pass heads/tags prefixes to transport,
2018-10-31), etc.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 15:08:12 -07:00
Jeff King 13e67aa39b v0 protocol: fix sha1/sha256 confusion for capabilities^{}
Commit eb398797cd (connect: advertized capability is not a ref,
2016-09-09) added support for an upload-pack server responding with:

  0000000000000000000000000000000000000000        capabilities^{}

followed by a NUL and the actual capabilities. We correctly parse the
oid using the packet_reader's hash_algo field, but then we compare it to
null_oid(), which will instead use our current repo's default algorithm.
If we're defaulting to sha256 locally but the other side is sha1, they
won't match and we'll fail to parse the line (and thus die()).

This can cause a test failure when the suite is run with
GIT_TEST_DEFAULT_HASH=sha256, and we even do so regularly via the
linux-sha256 CI job. But since the test requires JGit to run, it's
usually just skipped, and nobody noticed the problem.

The reason the original patch used JGit is that Git itself does not ever
produce such a line via upload-pack; the feature was added to fix a
real-world problem when interacting with JGit. That was good for
verifying that the incompatibility was fixed, but it's not a good
regression test:

  - hardly anybody runs it, because you have to have jgit installed;
    hence this bug going unnoticed

  - we're depending on jgit's behavior for the test to do anything
    useful. In particular, this behavior is only relevant to the v0
    protocol, but these days we ask for the v2 protocol by default. So
    for modern jgit, this is probably testing nothing.

  - it's complicated and slow. We had to do some fifo trickery to handle
    races, and this one test makes up 40% of the runtime of the total
    script.

Instead, let's just hard-code the response that's of interest to us.
That will test exactly what we want for every run, and reveals the bug
when run in sha256 mode. And of course we'll fix the actual bug by using
the correct hash_algo struct.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 15:08:12 -07:00
Jeff King e6c4309748 t5512: stop referring to "v1" protocol
There really isn't a "v1" Git protocol. It's just v0 with an extra probe
which we used to test compatibility in preparation for v2. Any tests
that are looking for before/after behavior for v2 really care about
"v0". Mentioning "v1" in these tests is just making things more
confusing, because we don't care about that probe; we're really testing
v0. So let's say so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 15:08:12 -07:00
Jeff King aa962fef27 v0 protocol: fix infinite loop when parsing multi-valued capabilities
If Git's client-side parsing of an upload-pack response (so git-fetch or
ls-remote) sees multiple instances of a single capability, it can enter
an infinite loop due to a bug in advancing the "offset" parameter in the
parser.

This bug can't happen between a client and server of the same Git
version. The client bug is in parse_feature_value() when the caller
passes in an offset parameter. And that only happens when the v0
protocol is parsing "symref" and "object-format" capabilities, via
next_server_feature_value(). But Git has never produced multiple
object-format capabilities, and it stopped producing multiple symref
values in d007dbf7d6 (Revert "upload-pack: send non-HEAD symbolic refs",
2013-11-18).

However, upload-pack did produce multiple symref entries for a while,
and they are valid. Plus other implementations, such as Dulwich will
still do so. So we should handle them. And even if we do not expect it,
it is obviously a bug for the parser to enter an infinite loop.

The bug itself is pretty simple. Commit 2c6a403d96 (connect: add
function to parse multiple v1 capability values, 2020-05-25) added the
"offset" parameter, which is used as both an in- and out-parameter. When
parsing the first "symref" capability, *offset will be 0 on input, and
after parsing the capability, we set *offset to an index just past the
value by taking a pointer difference "(value + end) - feature_list".

But on the second call, now *offset is set to that larger index, which
lets us skip past the first "symref" capability. However, we do so by
incrementing feature_list. That means our pointer difference is now too
small; it is counting from where we resumed parsing, not from the start
of the original feature_list pointer. And because we incremented
feature_list only inside our function, and not the caller, that
increment is lost next time the function is called.

One solution would be to account for those skipped bytes by incrementing
*offset, rather than assigning to it. But wait, there's more!

We also increment feature_list if we have a near-miss. Say we are
looking for "symref" and find "almost-symref". In that case we'll point
feature_list to the "y" in "almost-symref" and restart our search. But
that again means our offset won't be correct, as it won't account for
the bytes between the start of the string and that "y".

So instead, let's just record the beginning of the feature_list string
in a separate pointer that we never touch. That offset we take in and
return is meant to be using that point as a base, and now we'll do so
consistently.

Since the bug can't be reproduced using the current version of
git-upload-pack, we'll instead hard-code an input which triggers the
problem. Before this patch it loops forever re-parsing the second symref
entry. Now we check both that it finishes, and that it parses both
entries correctly (a case we could not test at all before).

We don't need to worry about testing v2 here; it communicates the
capabilities in a completely different way, and doesn't use this code at
all. There are tests earlier in t5512 that are meant to cover this (they
don't, but we'll address that in a future patch).

Reported-by: Jonas Haag <jonas@lophus.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 15:08:12 -07:00
Robin Jarry 3c8d3adeae send-email: export patch counters in validate environment
When sending patch series (with a cover-letter or not)
sendemail-validate is called with every email/patch file independently
from the others. When one of the patches depends on a previous one, it
may not be possible to use this hook in a meaningful way. A hook that
wants to check some property of the whole series needs to know which
patch is the final one.

Expose the current and total number of patches to the hook via the
GIT_SENDEMAIL_PATCH_COUNTER and GIT_SENDEMAIL_PATCH_TOTAL environment
variables so that both incremental and global validation is possible.

Sharing any other state between successive invocations of the validate
hook must be done via external means. For example, by storing it in
a git config sendemail.validateWorktree entry.

Add a sample script with placeholder validations and update tests to
check that the counters are properly exported.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Robin Jarry <robin@jarry.cc>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:41:15 -07:00
Patrick Steinhardt d85cd18777 repack: disable writing bitmaps when doing a local repack
In order to write a bitmap, we need to have full coverage of all objects
that are about to be packed. In the traditional non-multi-pack-index
world this meant we need to do a full repack of all objects into a
single packfile. But in the new multi-pack-index world we can get away
with writing bitmaps when we have multiple packfiles as long as the
multi-pack-index covers all objects.

This is not always the case though. When asked to perform a repack of
local objects, only, then we cannot guarantee to have full coverage of
all objects regardless of whether we do a full repack or a repack with a
multi-pack-index. The end result is that writing the bitmap will fail in
both worlds:

    $ git multi-pack-index write --stdin-packs --bitmap <packfiles
    warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
    error: could not write multi-pack bitmap

Now there are two different ways to fix this. The first one would be to
amend git-multi-pack-index(1) to disable writing bitmaps when we notice
that we don't have full object coverage.

    - We don't have enough information in git-multi-pack-index(1) in
      order to tell whether the local repository _should_ have full
      coverage. Because even when connected to an alternate object
      directory, it may be the case that we still have all objects
      around in the main object database.

    - git-multi-pack-index(1) is quite a low-level tool. Automatically
      disabling functionality that it was asked to provide does not feel
      like the right thing to do.

We can easily fix it at a higher level in git-repack(1) though. When
asked to only include local objects via `-l` and when connected to an
alternate object directory then we will override the user's ask and
disable writing bitmaps with a warning. This is similar to what we do in
git-pack-objects(1), where we also disable writing bitmaps in case we
omit an object from the pack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:52 -07:00
Patrick Steinhardt 932c16c04b repack: honor -l when calculating pack geometry
When the user passes `-l` to git-repack(1), then they essentially ask us
to only repack objects part of the local object database while ignoring
any packfiles part of an alternate object database. And we in fact honor
this bit when doing a geometric repack as the resulting packfile will
only ever contain local objects.

What we're missing though is that we don't take locality of packfiles
into account when computing whether the geometric sequence is intact or
not. So even though we would only ever roll up local packfiles anyway,
we could end up trying to repack because of non-local packfiles. This
does not make much sense, and in the worst case it can cause us to try
and do the geometric repack over and over again because we're never able
to restore the geometric sequence.

Fix this bug by honoring whether the user has passed `-l`. If so, we
skip adding any non-local packfiles to the pack geometry.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:52 -07:00
Patrick Steinhardt 19a3a7bde9 t/helper: allow chmtime to print verbosely without modifying mtime
The `test-tool chmtime` helper allows us to both read and modify the
modification time of files. But while it is possible to only read the
mtimes of a file via `--get`, it is not possible to read the mtimes
and report them together with their respective file paths via the
`--verbose` flag without also modifying the mtime at the same time.

Fix this so that it is possible to call `test-tool chmtime --verbose
<files>...` without modifying any mtimes.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:52 -07:00
Patrick Steinhardt f3028418c3 pack-objects: extend test coverage of --stdin-packs with alternates
We don't have any tests that verify that git-pack-objects(1) works with
`--stdin-packs` when combined with alternate object directories. Add
some to make sure that the basic functionality works as expected.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:52 -07:00
Patrick Steinhardt 752b465c3c pack-objects: fix error when same packfile is included and excluded
When passing the same packfile both as included and excluded via the
`--stdin-packs` option, then we will return an error because the
excluded packfile cannot be found. This is because we will only set the
`util` pointer for the included packfile list if it was found, so that
we later die when we notice that it's in fact not set for the excluded
packfile list.

Fix this bug by always setting the `util` pointer for both the included
and excluded list entries.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:51 -07:00
Patrick Steinhardt 732194b5f2 pack-objects: fix error when packing same pack twice
When passed the same packfile twice via `--stdin-packs` we return an
error that the packfile supposedly was not found. This is because when
reading packs into the list of included or excluded packfiles, we will
happily re-add packfiles even if they are part of the lists already. And
while the list can now contain duplicates, we will only set the `util`
pointer of the first list entry to the `packed_git` structure. We notice
that at a later point when checking that all list entries have their
`util` pointer set and die with an error.

While this is kind of a nonsensical request, this scenario can be hit
when doing geometric repacks. When a repository is connected to an
alternate object directory and both have the exact same packfile then
both would get added to the geometric sequence. And when we then decide
to perform the repack, we will invoke git-pack-objects(1) with the same
packfile twice.

Fix this bug by removing any duplicates from both the included and
excluded packs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:51 -07:00
Patrick Steinhardt b7b8f048f5 pack-objects: split out --stdin-packs tests into separate file
The test suite for git-pack-objects(1) is quite huge, and we're about to
add more tests that relate to the `--stdin-packs` option. Split out all
tests related to this option into a standalone file so that it becomes
easier to test the feature in isolation.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:51 -07:00