Commit graph

20788 commits

Author SHA1 Message Date
Derrick Stolee 0d30feef3c fsck: create scaffolding for rev-index checks
The 'fsck' builtin checks many of Git's on-disk data structures, but
does not currently validate the pack rev-index files (a .rev file to
pair with a .pack and .idx file).

Before doing a more-involved check process, create the scaffolding
within builtin/fsck.c to have a new error type and add that error type
when the API method verify_pack_revindex() returns an error. That method
does nothing currently, but we will add checks to it in later changes.

For now, check that 'git fsck' succeeds without any errors in the normal
case. Future checks will be paired with tests that corrupt the .rev file
appropriately.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-17 14:39:04 -07:00
Taylor Blau 9f7f10a282 t: invert GIT_TEST_WRITE_REV_INDEX
Back in e8c58f894b (t: support GIT_TEST_WRITE_REV_INDEX, 2021-01-25), we
added a test knob to conditionally enable writing a ".rev" file when
indexing a pack. At the time, this was used to ensure that the test
suite worked even when ".rev" files were written, which served as a
stress-test for the on-disk reverse index implementation.

Now that reading from on-disk ".rev" files is enabled by default, the
test knob `GIT_TEST_WRITE_REV_INDEX` no longer has any meaning.

We could get rid of the option entirely, but there would be no
convenient way to test Git when ".rev" files *aren't* in place.

Instead of getting rid of the option, invert its meaning to instead
disable writing ".rev" files, thereby running the test suite in a mode
where the reverse index is generated from scratch.

This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some
spelling of "true", we are still running and exercising Git's behavior
when forced to generate reverse indexes from scratch. Do so by setting
it in the linux-TEST-vars CI run to ensure that we are maintaining good
coverage of this now-legacy code.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 07:55:46 -07:00
Taylor Blau a8dd7e05b1 config: enable pack.writeReverseIndex by default
Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes,
2021-01-25), Git learned how to read and write a pack's reverse index
from a file instead of in-memory.

A pack's reverse index is a mapping from pack position (that is, the
order that objects appear together in a ".pack")  to their position in
lexical order (that is, the order that objects are listed in an ".idx"
file).

Reverse indexes are consulted often during pack-objects, as well as
during auxiliary operations that require mapping between pack offsets,
pack order, and index index.

They are useful in GitHub's infrastructure, where we have seen a
dramatic increase in performance when writing ".rev" files[1]. In
particular:

  - an ~80% reduction in the time it takes to serve fetches on a popular
    repository, Homebrew/homebrew-core.

  - a ~60% reduction in the peak memory usage to serve fetches on that
    same repository.

  - a collective savings of ~35% in CPU time across all pack-objects
    invocations serving fetches across all repositories in a single
    datacenter.

Reverse indexes are also beneficial to end-users as well as forges. For
example, the time it takes to generate a pack containing the objects for
the 10 most recent commits in linux.git (representing a typical push) is
significantly faster when on-disk reverse indexes are available:

    $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
    Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     543.0 ms ±  20.3 ms    [User: 616.2 ms, System: 58.8 ms]
      Range (min … max):   521.0 ms … 577.9 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     245.0 ms ±  11.4 ms    [User: 335.6 ms, System: 31.3 ms]
      Range (min … max):   226.0 ms … 259.6 ms    13 runs

    Summary
      'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
	2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'

The same is true of writing a pack containing the objects for the 30
most-recent commits:

    $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
    Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     866.5 ms ±  16.2 ms    [User: 1414.5 ms, System: 97.0 ms]
      Range (min … max):   839.3 ms … 886.9 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     581.6 ms ±  10.2 ms    [User: 1181.7 ms, System: 62.6 ms]
      Range (min … max):   567.5 ms … 599.3 ms    10 runs

    Summary
      'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
	1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'

...and savings on trivial operations like computing the on-disk size of
a single (packed) object are even more dramatic:

    $ git rev-parse HEAD >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):     305.8 ms ±  11.4 ms    [User: 264.2 ms, System: 41.4 ms]
      Range (min … max):   290.3 ms … 331.1 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):       4.0 ms ±   0.3 ms    [User: 1.7 ms, System: 2.3 ms]
      Range (min … max):     1.6 ms …   4.6 ms    1155 runs

    Summary
      'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
       76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'

In the more than two years since e37d0b8730 was merged, Git's
implementation of on-disk reverse indexes has been thoroughly tested,
both from users enabling `pack.writeReverseIndexes`, and from GitHub's
deployment of the feature. The latter has been running without incident
for more than two years.

This patch changes Git's behavior to write on-disk reverse indexes by
default when indexing a pack, which should make the above operations
faster for everybody's Git installation after a repack.

(The previous commit explains some potential drawbacks of using on-disk
reverse indexes in certain limited circumstances, that essentially boil
down to a trade-off between time to generate, and time to access. For
those limited cases, the `pack.readReverseIndex` escape hatch can be
used).

[1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 07:55:46 -07:00
Taylor Blau dbcf611617 pack-revindex: introduce pack.readReverseIndex
Since 1615c567b8 (Documentation/config/pack.txt: advertise
'pack.writeReverseIndex', 2021-01-25), we have had the
`pack.writeReverseIndex` configuration option, which tells Git whether
or not it is allowed to write a ".rev" file when indexing a pack.

Introduce a complementary configuration knob, `pack.readReverseIndex` to
control whether or not Git will read any ".rev" file(s) that may be
available on disk.

This option is useful for debugging, as well as disabling the effect of
".rev" files in certain instances.

This is useful because of the trade-off[^1] between the time it takes to
generate a reverse index (slow from scratch, fast when reading an
existing ".rev" file), and the time it takes to access a record (the
opposite).

For example, even though it is faster to use the on-disk reverse index
when computing the on-disk size of a packed object, it is slower to
enumerate the same value for all objects.

Here are a couple of examples from linux.git. When computing the above
for a single object, using the on-disk reverse index is significantly
faster:

    $ git rev-parse HEAD >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):     302.5 ms ±  12.5 ms    [User: 258.7 ms, System: 43.6 ms]
      Range (min … max):   291.1 ms … 328.1 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):       3.9 ms ±   0.3 ms    [User: 1.6 ms, System: 2.4 ms]
      Range (min … max):     2.0 ms …   4.4 ms    801 runs

    Summary
      'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
       77.29 ± 7.14 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'

, but when instead trying to compute the on-disk object size for all
objects in the repository, using the ".rev" file is a disadvantage over
creating the reverse index from scratch:

    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects
      Time (mean ± σ):      8.258 s ±  0.035 s    [User: 7.949 s, System: 0.308 s]
      Range (min … max):    8.199 s …  8.293 s    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects
      Time (mean ± σ):     16.976 s ±  0.107 s    [User: 16.706 s, System: 0.268 s]
      Range (min … max):   16.839 s … 17.105 s    10 runs

    Summary
      'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran
	2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'

Luckily, the results when running `git cat-file` with `--unordered` are
closer together:

    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
      Time (mean ± σ):      5.066 s ±  0.105 s    [User: 4.792 s, System: 0.274 s]
      Range (min … max):    4.943 s …  5.220 s    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
      Time (mean ± σ):      6.193 s ±  0.069 s    [User: 5.937 s, System: 0.255 s]
      Range (min … max):    6.145 s …  6.356 s    10 runs

    Summary
      'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran
        1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'

Because the equilibrium point between these two is highly machine- and
repository-dependent, allow users to configure whether or not they will
read any ".rev" file(s) with this configuration knob.

[^1]: Generating a reverse index in memory takes O(N) time (where N is
  the number of objects in the repository), since we use a radix sort.
  Reading an entry from an on-disk ".rev" file is slower since each
  operation is bound by disk I/O instead of memory I/O.

  In order to compute the on-disk size of a packed object, we need to
  find the offset of our object, and the adjacent object (the on-disk
  size difference of these two). Finding the first offset requires a
  binary search. Finding the latter involves a single .rev lookup.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 07:55:46 -07:00
Taylor Blau b77919ed6e t5325: mark as leak-free
This test is leak-free as of the previous commit, so let's mark it as
such to ensure we don't regress and introduce a leak in the future.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 07:55:45 -07:00
Junio C Hamano 063cd850f2 Merge branch 'jk/use-perl-path-consistently'
Tests had a few places where we ignored PERL_PATH and blindly used
/usr/bin/perl, which have been corrected.

* jk/use-perl-path-consistently:
  t/lib-httpd: pass PERL_PATH to CGI scripts
2023-04-11 13:49:13 -07:00
Junio C Hamano 96f4113ac0 Merge branch 'jc/clone-object-format-from-void'
"git clone" from an empty repository learned to propagate the
choice of the hash algorithm from the source repository to the
newly created repository.

* jc/clone-object-format-from-void:
  clone: propagate object-format when cloning from void
2023-04-11 13:49:13 -07:00
Junio C Hamano 30e04bcfa8 Merge branch 'ar/adjust-tests-for-the-index-fallout'
Comment updates.

* ar/adjust-tests-for-the-index-fallout:
  t2107: fix mention of the_index.cache_changed
  t3060: fix mention of function prune_index
2023-04-11 13:49:12 -07:00
Junio C Hamano 647a2bb3ff Merge branch 'jc/spell-id-in-both-caps-in-message-id'
Consistently spell "Message-ID" as such, not "Message-Id".

* jc/spell-id-in-both-caps-in-message-id:
  e-mail workflow: Message-ID is spelled with ID in both capital letters
2023-04-11 13:49:12 -07:00
Junio C Hamano d02343b599 Merge branch 'ws/sparse-check-rules'
"git sparse-checkout" command learns a debugging aid for the sparse
rule definitions.

* ws/sparse-check-rules:
  builtin/sparse-checkout: add check-rules command
  builtin/sparse-checkout: remove NEED_WORK_TREE flag
2023-04-11 13:49:12 -07:00
Junio C Hamano 89833fc249 Merge branch 'ds/fetch-bundle-uri-with-all'
"git fetch --all" does not have to download and handle the same
bundleURI over and over, which has been corrected.

* ds/fetch-bundle-uri-with-all:
  fetch: download bundles once, even with --all
2023-04-06 13:38:32 -07:00
Junio C Hamano 0b94009649 Merge branch 'jk/chainlint-fixes'
Test framework fix.

* jk/chainlint-fixes:
  tests: skip test_eval_ in internal chain-lint
  tests: drop here-doc check from internal chain-linter
  tests: diagnose unclosed here-doc in chainlint.pl
  tests: replace chainlint subshell with a function
  tests: run internal chain-linter under "make test"
2023-04-06 13:38:31 -07:00
Junio C Hamano 6047b28eb7 Merge branch 'en/header-split-cleanup'
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.

* en/header-split-cleanup:
  csum-file.h: remove unnecessary inclusion of cache.h
  write-or-die.h: move declarations for write-or-die.c functions from cache.h
  treewide: remove cache.h inclusion due to setup.h changes
  setup.h: move declarations for setup.c functions from cache.h
  treewide: remove cache.h inclusion due to environment.h changes
  environment.h: move declarations for environment.c functions from cache.h
  treewide: remove unnecessary includes of cache.h
  wrapper.h: move declarations for wrapper.c functions from cache.h
  path.h: move function declarations for path.c functions from cache.h
  cache.h: remove expand_user_path()
  abspath.h: move absolute path functions from cache.h
  environment: move comment_line_char from cache.h
  treewide: remove unnecessary cache.h inclusion from several sources
  treewide: remove unnecessary inclusion of gettext.h
  treewide: be explicit about dependence on gettext.h
  treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06 13:38:31 -07:00
Junio C Hamano 72871b198f Merge branch 'ab/remove-implicit-use-of-the-repository'
Code clean-up around the use of the_repository.

* ab/remove-implicit-use-of-the-repository:
  libs: use "struct repository *" argument, not "the_repository"
  post-cocci: adjust comments for recent repo_* migration
  cocci: apply the "revision.h" part of "the_repository.pending"
  cocci: apply the "rerere.h" part of "the_repository.pending"
  cocci: apply the "refs.h" part of "the_repository.pending"
  cocci: apply the "promisor-remote.h" part of "the_repository.pending"
  cocci: apply the "packfile.h" part of "the_repository.pending"
  cocci: apply the "pretty.h" part of "the_repository.pending"
  cocci: apply the "object-store.h" part of "the_repository.pending"
  cocci: apply the "diff.h" part of "the_repository.pending"
  cocci: apply the "commit.h" part of "the_repository.pending"
  cocci: apply the "commit-reach.h" part of "the_repository.pending"
  cocci: apply the "cache.h" part of "the_repository.pending"
  cocci: add missing "the_repository" macros to "pending"
  cocci: sort "the_repository" rules by header
  cocci: fix incorrect & verbose "the_repository" rules
  cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-06 13:38:30 -07:00
Junio C Hamano 06e9e726d4 Merge branch 'gc/config-parsing-cleanup'
Config API clean-up to reduce its dependence on static variables

* gc/config-parsing-cleanup:
  config.c: rename "struct config_source cf"
  config: report cached filenames in die_bad_number()
  config.c: remove current_parsing_scope
  config.c: remove current_config_kvi
  config.c: plumb the_reader through callbacks
  config.c: create config_reader and the_reader
  config.c: don't assign to "cf_global" directly
  config.c: plumb config_source through static fns
2023-04-06 13:38:29 -07:00
Junio C Hamano 87daf40750 Merge branch 'ab/config-multi-and-nonbool'
Assorted config API updates.

* ab/config-multi-and-nonbool:
  for-each-repo: with bad config, don't conflate <path> and <cmd>
  config API: add "string" version of *_value_multi(), fix segfaults
  config API users: test for *_get_value_multi() segfaults
  for-each-repo: error on bad --config
  config API: have *_multi() return an "int" and take a "dest"
  versioncmp.c: refactor config reading next commit
  config API: add and use a "git_config_get()" family of functions
  config tests: add "NULL" tests for *_get_value_multi()
  config tests: cover blind spots in git_die_config() tests
2023-04-06 13:38:29 -07:00
Junio C Hamano 955abf5f72 Merge branch 'jk/unused-post-2.40-part2'
Code clean-up for "-Wunused-parameter" build.

* jk/unused-post-2.40-part2:
  parse-options: drop parse_opt_unknown_cb()
  t/helper: mark unused argv/argc arguments
  mark "argv" as unused when we check argc
  builtins: mark unused prefix parameters
  builtins: annotate always-empty prefix parameters
  builtins: always pass prefix to parse_options()
  fast-import: fix file access when run from subdir
2023-04-06 13:38:28 -07:00
Junio C Hamano 119e82a515 Merge branch 'ps/ahead-behind-truncation-fix'
Fix unnecessary truncation of generation numbers used in-core.

* ps/ahead-behind-truncation-fix:
  commit-graph: fix truncated generation numbers
2023-04-06 13:38:27 -07:00
Junio C Hamano 7727da99df Merge branch 'ds/ahead-behind'
"git for-each-ref" learns '%(ahead-behind:<base>)' that computes the
distances from a single reference point in the history with bunch
of commits in bulk.

* ds/ahead-behind:
  commit-reach: add tips_reachable_from_bases()
  for-each-ref: add ahead-behind format atom
  commit-reach: implement ahead_behind() logic
  commit-graph: introduce `ensure_generations_valid()`
  commit-graph: return generation from memory
  commit-graph: simplify compute_generation_numbers()
  commit-graph: refactor compute_topological_levels()
  for-each-ref: explicitly test no matches
  for-each-ref: add --stdin option
2023-04-06 13:38:21 -07:00
Jeff King c1917156a0 t/lib-httpd: pass PERL_PATH to CGI scripts
As discussed in t/README, tests should aim to use PERL_PATH rather than
straight "perl". We usually do this automatically with a "perl" function
in test-lib.sh, but a few cases need to be handled specially.

One such case is the apply-one-time-perl.sh CGI, which invokes plain
"perl". It should be using $PERL_PATH, but to make that work, we must
also instruct Apache to pass through the variable.

Prior to this patch, doing:

  mv /usr/bin/perl /usr/bin/my-perl
  make PERL_PATH=/usr/bin/my-perl test

would fail t5702, t5703, and t5616. After this it passes. This is a
pretty extreme case, as even if you install perl elsewhere, you'd likely
still have it in your $PATH. A more realistic case is that you don't
want to use the perl in your $PATH (because it's older, broken, etc) and
expect PERL_PATH to consistently override that (since that's what it's
documented to do). Removing it completely is just a convenient way of
completely breaking it for testing purposes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06 09:29:43 -07:00
Junio C Hamano 8b214c2e9d clone: propagate object-format when cloning from void
A user could prepare an empty repository and set it to use SHA256 as
the object format.  The new repository created by "git clone" from
such a repository however would not record that it is expecting
objects in the same SHA256 format.  This works as expected if the
source repository is not empty.

Just like we started copying the name of the primary branch from the
remote repository even if it is unborn in 3d8314f8 (clone: propagate
empty remote HEAD even with other branches, 2022-07-07), lift the
code that records the object format out of the block executed only
when cloning from an instantiated repository, so that it works also
when cloning from an empty repository.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-05 14:17:00 -07:00
Junio C Hamano 45602dd029 Merge branch 'ar/test-cleanup-unused-file-creation'
Test clean-up.

* ar/test-cleanup-unused-file-creation:
  t1507: assert output of rev-parse
  t1404: don't create unused file
  t1400: assert output of update-ref
  t1302: don't create unused file
  t1010: don't create unused files
  t1006: assert error output of cat-file
  t1005: assert output of ls-files
2023-04-04 14:28:29 -07:00
Junio C Hamano 62df03c277 Merge branch 'jk/blame-contents-with-arbitrary-commit'
"git blame --contents=<file> <rev> -- <path>" used to be forbidden,
but now it finds the origins of lines starting at <file> contents
through the history that leads to <rev>.

* jk/blame-contents-with-arbitrary-commit:
  blame: allow --contents to work with non-HEAD commit
2023-04-04 14:28:28 -07:00
Junio C Hamano 6dd9d96129 Merge branch 'rs/archive-mtime'
Test update.

* rs/archive-mtime:
  t5000: use check_mtime()
2023-04-04 14:28:28 -07:00
Junio C Hamano 9142fce9b0 Merge branch 'ah/rebase-merges-config'
Streamline --rebase-merges command line option handling and
introduce rebase.merges configuration variable.

* ah/rebase-merges-config:
  rebase: add a config option for --rebase-merges
  rebase: deprecate --rebase-merges=""
  rebase: add documentation and test for --no-rebase-merges
2023-04-04 14:28:28 -07:00
Junio C Hamano 7e13d654c2 Merge branch 'jk/fast-export-cleanup'
Code clean-up.

* jk/fast-export-cleanup:
  fast-export: drop unused parameter from anonymize_commit_message()
  fast-export: drop data parameter from anonymous generators
  fast-export: de-obfuscate --anonymize-map handling
  fast-export: factor out anonymized_entry creation
  fast-export: simplify initialization of anonymized hashmaps
  fast-export: drop const when storing anonymized values
2023-04-04 14:28:27 -07:00
Junio C Hamano f315a8b609 Merge branch 'js/split-index-fixes'
The index files can become corrupt under certain conditions when
the split-index feature is in use, especially together with
fsmonitor, which have been corrected.

* js/split-index-fixes:
  unpack-trees: take care to propagate the split-index flag
  fsmonitor: avoid overriding `cache_changed` bits
  split-index; stop abusing the `base_oid` to strip the "link" extension
  split-index & fsmonitor: demonstrate a bug
2023-04-04 14:28:27 -07:00
Junio C Hamano f834089925 Merge branch 'pw/wildmatch-fixes'
The wildmatch library code unlearns exponential behaviour it
acquired some time ago since it was borrowed from rsync.

* pw/wildmatch-fixes:
  t3070: make chain lint tester happy
  wildmatch: hide internal return values
  wildmatch: avoid undefined behavior
  wildmatch: fix exponential behavior
2023-04-04 14:28:27 -07:00
Junio C Hamano ba4324c4e1 e-mail workflow: Message-ID is spelled with ID in both capital letters
We used to write "Message-Id:" and "Message-ID:" pretty much
interchangeably, and the header name is defined to be case
insensitive by the RFCs, but the canonical form "Message-ID:" is
used throughout the RFC documents, so let's imitate it ourselves.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
2023-04-03 08:55:43 -07:00
Junio C Hamano 290a973bb9 Merge branch 'ds/p2000-fix-grep-sparse'
Fix perf test.

* ds/p2000-fix-grep-sparse:
  p2000: remove stray '--sparse' flag from test
2023-03-31 17:50:23 -07:00
Junio C Hamano 0d865049f7 Merge branch 'ab/retire-scripted-add-p'
Test fix.

* ab/retire-scripted-add-p:
  t3701: we don't need no Perl for `add -i` anymore
2023-03-31 17:50:23 -07:00
Junio C Hamano dd88a1af1a Merge branch 'js/t5563-portability-fix'
Test portability fix.

* js/t5563-portability-fix:
  t5563: prevent "ambiguous redirect"
2023-03-31 17:50:23 -07:00
Andrei Rybak 1ec40a83a5 t2107: fix mention of the_index.cache_changed
Commit [1] added a test to t2107-update-index-basic.sh with a comment
that mentions macro "active_cache_changed".  Later in [2], the macro was
removed and its usage in function cmd_update_index in file
builtin/update-index.c was replaced with "the_index.cache_changed".

Fix the outdated comment in file t2107-update-index-basic.sh.

[1] fa137f67a4 (lockfile.c: store absolute path, 2014-11-02)
[2] dc594180d9 (cocci & cache.h: apply variable section of "pending"
    index-compatibility, 2022-11-19)

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-31 16:57:04 -07:00
Andrei Rybak 993d7085be t3060: fix mention of function prune_index
Commit [1] added tests which trigger function prune_cache.  The comments
in these tests, however, incorrectly call it "prune_path".  Since then,
function "prune_cache" has been renamed to "prune_index" in commit [2].
Later still in commit [3], the_index singleton, which is also mentioned
in a comment, stopped being used directly with function "prune_index".

Fix mentions of function "prune_index" and the struct it changes in
comments in file "t3060-ls-files-with-tree.sh".

[1] 54e1abce90 (Add test case for ls-files --with-tree, 2007-10-03)
[2] 6510ae173a (ls-files: convert prune_cache to take an index,
    2017-06-12)
[3] 188dce131f (ls-files: use repository object, 2017-06-22)

Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-31 16:57:03 -07:00
Derrick Stolee 25bccb4b79 fetch: download bundles once, even with --all
When fetch.bundleURI is set, 'git fetch' downloads bundles from the
given bundle URI before fetching from the specified remote. However,
when using non-file remotes, 'git fetch --all' will launch 'git fetch'
subprocesses which then read fetch.bundleURI and fetch the bundle list
again. We do not expect the bundle list to have new information during
these multiple runs, so avoid these extra calls by un-setting
fetch.bundleURI in the subprocess arguments.

Be careful to skip fetching bundles for the empty bundle string.
Fetching bundles from the empty list presents some interesting test
failures.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-31 10:07:33 -07:00
Johannes Schindelin 92c7b3d473 t5563: prevent "ambiguous redirect"
When I ran this test using `TEST_SHELL_PATH=/bin/bash` in my Ubuntu
setup (where Bash is at version 5.0.17(1)-release), I was greeted with
this error message:

	./test-lib.sh: line 1072: $CHALLENGE: ambiguous redirect

This commit fixes that error by quoting the `CHALLENGE` variable (which
has as value a path containing spaces), and by avoiding to cuddle the
empty string parameter in the `printf` call with the redirect character
(in fact, the `printf ''>$CHALLENGE` is removed because the next line
overwrites the file anyway because it _also_ uses a single `>` to
redirect the output).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-31 08:50:30 -07:00
Jeff King cc48ddd937 tests: skip test_eval_ in internal chain-lint
To check for broken &&-chains, we run "fail_117 && $1" as a test
snippet, and check the exit code. We use test_eval_ to do so, because
that's the way we run the actual test.

But we don't need any of its niceties, like "set -x" tracing. In fact,
they hinder us, because we have to explicitly disable them. So let's
skip that and use "eval" more directly, which is simpler. I had hoped it
would also be faster, but it doesn't seem to produce a measurable
improvement (probably because it's just running internal shell commands,
with no subshells or forks).

Note that there is one gotcha: even though we don't intend to run any of
the commands if the &&-chain is intact, an error like this:

   test_expect_success 'broken' '
	# this next line breaks the &&-chain
	true
	# and then this one is executed even by the linter
	return 1
   '

means we'll "return 1" from the eval, and thus from test_run_(). We
actually do notice this in test_expect_success, but only by saying "hey,
this test didn't say it was OK, so it must have failed", which is not
right (it should say "broken &&-chain").

We can handle this by calling test_eval_inner_() instead, which is our
trick for wrapping "return" in a test snippet. But to do that, we have
to push the trace code out of that inner function and into test_eval_().
This is arguably where it belonged in the first place, but it never
mattered because the "inner_" function had only one caller.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-30 13:07:29 -07:00
Jeff King 750b260411 tests: drop here-doc check from internal chain-linter
Commit 99a64e4b73 (tests: lint for run-away here-doc, 2017-03-22)
tweaked the chain-lint test to catch unclosed here-docs. It works by
adding an extra "echo" command after the test snippet, and checking that
it is run (if it gets swallowed by a here-doc, naturally it is not run).

The downside here is that we introduced an extra $() substitution, which
happens in a subshell. This has a measurable performance impact when
run for many tests.

The tradeoff in safety was undoubtedly worth it when 99a64e4b73 was
written. But since the external chainlint.pl learned to find these
recently, we can just rely on it. By switching back to a simpler
chain-lint, hyperfine reports a measurable speedup on t3070 (which has
1800 tests):

  'HEAD' ran
    1.12 ± 0.01 times faster than 'HEAD~1'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-30 13:07:29 -07:00
Eric Sunshine 2b61c8dc88 tests: diagnose unclosed here-doc in chainlint.pl
An unclosed here-doc in a test is a problem, because it silently gobbles
up any remaining commands. Since 99a64e4b73 (tests: lint for run-away
here-doc, 2017-03-22) we detect this by piggy-backing on the internal
chainlint checker in test-lib.sh.

However, it would be nice to detect it in chainlint.pl, for a few
reasons:

  - the output from chainlint.pl is much nicer; it can show the exact
    spot of the error, rather than a vague "somewhere in this test you
    broke the &&-chain or had a bad here-doc" message.

  - the implementation in test-lib.sh runs for each test snippet. And
    since it requires a subshell, the extra cost is small but not zero.
    If chainlint.pl can reliably find the problem, we can optimize the
    test-lib.sh code.

The chainlint.pl code never intended to find here-doc problems. But
since it has to parse them anyway (to avoid reporting problems inside
here-docs), most of what we need is already there. We can detect the
problem when we fail to find the missing end-tag in swallow_heredocs().
The extra change in scan_heredoc_tag() stores the location of the start
of the here-doc, which lets us mark it as the source of the error in the
output (see the new tests for examples).

[jk: added commit message and tests]

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-30 13:07:29 -07:00
Jeff King 1686de55fa tests: replace chainlint subshell with a function
To test that we don't break the &&-chain, test-lib.sh does something
like:

   (exit 117) && $test_commands

and checks that the result is exit code 117. We don't care what that
initial command is, as long as it exits with a unique code. Using "exit"
works and is simple, but is a bit expensive since it requires a subshell
(to avoid exiting the whole script!). This isn't usually very
noticeable, but it can add up for scripts which have a large number of
tests.

Using "return" naively won't work here, because we'd return from the
function eval-ing the snippet (and it wouldn't find &&-chain breakages).
But if we further push that into its own function, it does exactly what
we want, without extra subshell overhead.

According to hyperfine, this produces a measurable improvement when
running t3070 (which has 1800 tests, all of them quite short):

  'HEAD' ran
    1.09 ± 0.01 times faster than 'HEAD~1'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-30 13:07:29 -07:00
Jeff King 7b6555ab8d tests: run internal chain-linter under "make test"
Since 69b9924b87 (t/Makefile: teach `make test` and `make prove` to run
chainlint.pl, 2022-09-01), we run a single chainlint.pl process for all
scripts, and then instruct each individual script to run with the
equivalent of --no-chain-lint, which tells them not to redundantly run
the chainlint script themselves.

However, this also disables the internal linter run within the shell by
eval-ing "(exit 117) && $1" and confirming we get code 117. In theory
the external linter produces a superset of complaints, and we don't need
the internal one anymore. However, we know there is at least one case
where they differ. A test like:

	test_expect_success 'should fail linter' '
		false &&
		sleep 2 &
		pid=$! &&
		kill $pid
	'

is buggy (it ignores the failure from "false", because it is
backgrounded along with the sleep). The internal linter catches this,
but the external one doesn't (and teaching it to do so is complicated[1]).
So not only does "make test" miss this problem, but it's doubly
confusing because running the script standalone does complain.

Let's teach the suppression in the Makefile to only turn off the
external linter (which we know is redundant, as it was already run) and
leave the internal one intact.

I've used a new environment variable to do this here, and intentionally
did not add a "--no-ext-chain-lint" option. This is an internal
optimization used by the Makefile, and not something that ordinary users
would need to tweak.

[1] For discussion of chainlint.pl and this case, see:
    https://lore.kernel.org/git/CAPig+cQtLFX4PgXyyK_AAkCvg4Aw2RAC5MmLbib-aHHgTBcDuw@mail.gmail.com/

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-30 13:07:29 -07:00
Jeff King 126e3b3d2a t/helper: mark unused argv/argc arguments
Many test helper programs do not bother to look at argc or argv, because
they don't take any options. In a user-facing program, it's a good idea
to check for unexpected arguments and complain. But for a test helper,
it's not worth the trouble to enforce this.

But we do want to tell the compiler we're OK with ignoring them, to
silence -Wunused-parameter (and obviously we can't get rid of them,
since we have to conform to the usual cmd__foo() interface).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 14:11:24 -07:00
Jeff King 9dc607f1c2 fast-import: fix file access when run from subdir
In cmd_fast_import(), we ignore the "prefix" argument entirely, even
though it tells us how we may have changed directory to the root of the
repository earlier in the process. Which means that if you run it from a
subdir and point to paths in the filesystem, like:

  cd subdir
  git fast-import --import-marks=foo <dump

then it will look for "foo" in the root of the repository, not the
current directory ("subdir/") which the user would have expected.

We can fix this by recording the prefix and using it as appropriate
whenever we open a file for reading or writing. I found each of these by
looking for cases where we call fopen() within fast-import.c, so this
should cover all cases. The new test triggers each one, as well as
making sure we don't accidentally apply the prefix when --relative-marks
is in use (since that option interprets some paths as relative to a
specific directory).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 14:11:24 -07:00
Derrick Stolee d52fcf493b p2000: remove stray '--sparse' flag from test
This argument was added in 7cae7627c4 (builtin/grep.c: integrate with
sparse index, 2022-09-22), but it was a carry-over from an earlier
version where the --sparse flag was added to the 'git grep' builtin.
This argument does not exist, so currently the
p2000-sparse-operations.sh performance test script fails when reaching
this step.

With this fix, the script works with these numbers for my copy of the
Git source code repository:

Test                                         HEAD
------------------------------------------------------------
2000.30: git grep --cached ... (full-v3)     0.34(1.20+0.14)
2000.31: git grep --cached ... (full-v4)     0.31(1.15+0.13)
2000.32: git grep --cached ... (sparse-v3)   0.26(1.13+0.12)
2000.33: git grep --cached ... (sparse-v4)   0.27(1.13+0.12)

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 13:25:52 -07:00
Glen Choo e2016508e7 config: report cached filenames in die_bad_number()
If, when parsing numbers from config, die_bad_number() is called, it
reports the filename and config source type if we were parsing a config
file, but not if we were iterating a config_set (it defaults to a less
specific error message). Most call sites don't parse config files
because config is typically read once and cached, so we only report
filename and config source type in "git config --type" (since "git
config" always parses config files).

This could have been fixed when we taught the current_config_*
functions to respect config_set values (0d44a2dacc (config: return
configset value for current_config_ functions, 2016-05-26), but it was
hard to spot then and we might have just missed it (I didn't find
mention of die_bad_number() in the original ML discussion [1].)

Fix this by refactoring the current_config_* functions into variants
that don't BUG() when we aren't reading config, and using the resulting
functions in die_bad_number(). "git config --get[-regexp] --type=int"
cannot use the non-refactored version because it parses the int value
_after_ parsing the config file, which would run into the BUG().

Since the refactored functions aren't public, they use "struct
config_reader".

1. https://lore.kernel.org/git/20160518223712.GA18317@sigill.intra.peff.net/

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 13:03:27 -07:00
Junio C Hamano f879501ad0 Merge branch 'jk/fix-proto-downgrade-to-v0'
Transports that do not support protocol v2 did not correctly fall
back to protocol v0 under certain conditions, which has been
corrected.

* jk/fix-proto-downgrade-to-v0:
  git_connect(): fix corner cases in downgrading v2 to v0
2023-03-28 10:51:52 -07:00
Junio C Hamano 8069aa01cd Merge branch 'fc/oid-quietly-parse-upstream'
"git rev-parse --quiet foo@{u}", or anything that asks @{u} to be
parsed with GET_OID_QUIETLY option, did not quietly fail, which has
been corrected.

* fc/oid-quietly-parse-upstream:
  object-name: fix quiet @{u} parsing
2023-03-28 10:51:52 -07:00
Junio C Hamano 6041a13ec2 Merge branch 'fc/completion-colors-do-not-need-prompt-command'
Lift the limitation that colored prompts can only be used with
PROMPT_COMMAND mode.

* fc/completion-colors-do-not-need-prompt-command:
  completion: prompt: use generic colors
2023-03-28 10:51:52 -07:00
Ævar Arnfjörð Bjarmason 3611f7467f for-each-repo: with bad config, don't conflate <path> and <cmd>
Fix a logic error in 4950b2a2b5 (for-each-repo: run subcommands on
configured repos, 2020-09-11). Due to assuming that elements returned
from the repo_config_get_value_multi() call wouldn't be "NULL" we'd
conflate the <path> and <command> part of the argument list when
running commands.

As noted in the preceding commit the fix is to move to a safer
"*_string_multi()" version of the *_multi() API. This change is
separated from the rest because those all segfaulted. In this change
we ended up with different behavior.

When using the "--config=<config>" form we take each element of the
list as a path to a repository. E.g. with a configuration like:

	[repo] list = /some/repo

We would, with this command:

	git for-each-repo --config=repo.list status builtin

Run a "git status" in /some/repo, as:

	git -C /some/repo status builtin

I.e. ask "status" to report on the "builtin" directory. But since a
configuration such as this would result in a "struct string_list *"
with one element, whose "string" member is "NULL":

	[repo] list

We would, when constructing our command-line in
"builtin/for-each-repo.c"...

	strvec_pushl(&child.args, "-C", path, NULL);
	for (i = 0; i < argc; i++)
		strvec_push(&child.args, argv[i]);

...have that "path" be "NULL", and as strvec_pushl() stops when it
sees NULL we'd end with the first "argv" element as the argument to
the "-C" option, e.g.:

	git -C status builtin

I.e. we'd run the command "builtin" in the "status" directory.

In another context this might be an interesting security
vulnerability, but I think that this amounts to a nothingburger on
that front.

A hypothetical attacker would need to be able to write config for the
victim to run, if they're able to do that there's more interesting
attack vectors. See the "safe.directory" facility added in
8d1a744820 (setup.c: create `safe.bareRepository`, 2022-07-14).

An even more unlikely possibility would be an attacker able to
generate the config used for "for-each-repo --config=<key>", but
nothing else (e.g. an automated system producing that list).

Even in that case the attack vector is limited to the user running
commands whose name matches a directory that's interesting to the
attacker (e.g. a "log" directory in a repository). The second
argument (if any) of the command is likely to make git die without
doing anything interesting (e.g. "-p" to "log", there being no "-p"
built-in command to run).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason 9e2d884d0f config API: add "string" version of *_value_multi(), fix segfaults
Fix numerous and mostly long-standing segfaults in consumers of
the *_config_*value_multi() API. As discussed in the preceding commit
an empty key in the config syntax yields a "NULL" string, which these
users would give to strcmp() (or similar), resulting in segfaults.

As this change shows, most users users of the *_config_*value_multi()
API didn't really want such an an unsafe and low-level API, let's give
them something with the safety of git_config_get_string() instead.

This fix is similar to what the *_string() functions and others
acquired in[1] and [2]. Namely introducing and using a safer
"*_get_string_multi()" variant of the low-level "_*value_multi()"
function.

This fixes segfaults in code introduced in:

  - d811c8e17c (versionsort: support reorder prerelease suffixes, 2015-02-26)
  - c026557a37 (versioncmp: generalize version sort suffix reordering, 2016-12-08)
  - a086f921a7 (submodule: decouple url and submodule interest, 2017-03-17)
  - a6be5e6764 (log: add log.excludeDecoration config option, 2020-04-16)
  - 92156291ca (log: add default decoration filter, 2022-08-05)
  - 50a044f1e4 (gc: replace config subprocesses with API calls, 2022-09-27)

There are now two users ofthe low-level API:

- One in "builtin/for-each-repo.c", which we'll convert in a
  subsequent commit.

- The "t/helper/test-config.c" code added in [3].

As seen in the preceding commit we need to give the
"t/helper/test-config.c" caller these "NULL" entries.

We could also alter the underlying git_configset_get_value_multi()
function to be "string safe", but doing so would leave no room for
other variants of "*_get_value_multi()" that coerce to other types.

Such coercion can't be built on the string version, since as we've
established "NULL" is a true value in the boolean context, but if we
coerced it to "" for use in a list of strings it'll be subsequently
coerced to "false" as a boolean.

The callback pattern being used here will make it easy to introduce
e.g. a "multi" variant which coerces its values to "bool", "int",
"path" etc.

1. 40ea4ed903 (Add config_error_nonbool() helper function,
   2008-02-11)
2. 6c47d0e8f3 (config.c: guard config parser from value=NULL,
   2008-02-11).
3. 4c715ebb96 (test-config: add tests for the config_set API,
   2014-07-28)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:37:53 -07:00