Commit graph

18257 commits

Author SHA1 Message Date
Junio C Hamano 4c90d8908a Merge branch 'jn/log-m-does-not-imply-p'
Earlier "git log -m" was changed to always produce patch output,
which would break existing scripts, which has been reverted.

* jn/log-m-does-not-imply-p:
  Revert 'diff-merges: let "-m" imply "-p"'
2021-08-11 12:36:18 -07:00
Jonathan Nieder 6a38e33331 Revert 'diff-merges: let "-m" imply "-p"'
This reverts commit f5bfcc823b, which
made "git log -m" imply "--patch" by default.  The logic was that
"-m", which makes diff generation for merges perform a diff against
each parent, has no use unless I am viewing the diff, so we could save
the user some typing by turning on display of the resulting diff
automatically.  That wasn't expected to adversely affect scripts
because scripts would either be using a command like "git diff-tree"
that already emits diffs by default or would be combining -m with a
diff generation option such as --name-status.  By saving typing for
interactive use without adversely affecting scripts in the wild, it
would be a pure improvement.

The problem is that although diff generation options are only relevant
for the displayed diff, a script author can imagine them affecting
path limiting.  For example, I might run

	git log -w --format=%H -- README

hoping to list commits that edited README, excluding whitespace-only
changes.  In fact, a whitespace-only change is not TREESAME so the use
of -w here has no effect (since we don't apply these diff generation
flags to the diff_options struct rev_info::pruning used for this
purpose), but the documentation suggests that it should work

	Suppose you specified foo as the <paths>. We shall call
	commits that modify foo !TREESAME, and the rest TREESAME. (In
	a diff filtered for foo, they look different and equal,
	respectively.)

and a script author who has not tested whitespace-only changes
wouldn't notice.

Similarly, a script author could include

	git log -m --first-parent --format=%H -- README

to filter the first-parent history for commits that modified README.
The -m is a no-op but it reflects the script author's intent.  For
example, until 1e20a407fe (stash list: stop passing "-m" to "git
log", 2021-05-21), "git stash list" did this.

As a result, we can't safely change "-m" to imply "-p" without fear of
breaking such scripts.  Restore the previous behavior.

Noticed because Rust's src/bootstrap/bootstrap.py made use of this
same construct: https://github.com/rust-lang/rust/pull/87513.  That
script has been updated to omit the unnecessary "-m" option, but we
can expect other scripts in the wild to have similar expectations.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-09 13:52:01 -07:00
Junio C Hamano aa7d2fe355 Merge branch 'cb/t7508-regexp-fix'
* cb/t7508-regexp-fix:
  t7508: avoid non POSIX BRE
2021-08-06 12:52:22 -07:00
Junio C Hamano c87977a0c5 Merge branch 'fc/disable-checkwinsize'
* fc/disable-checkwinsize:
  test: fix for COLUMNS and bash 5
2021-08-06 12:50:26 -07:00
Felipe Contreras 390b44eb2b test: fix for COLUMNS and bash 5
Since c49a177bec (test-lib.sh: set COLUMNS=80 for --verbose
repeatability, 2021-06-29) multiple tests have been failing when using
bash 5 because checkwinsize is enabled by default, therefore COLUMNS is
reset using TIOCGWINSZ even for non-interactive shells.

It's debatable whether or not bash should even be doing that, but for
now we can avoid this undesirable behavior by disabling this option.

Reported-by: Fabian Stelzer <fabian.stelzer@campoint.net>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
[jc: with SZEDER Gábor's suggestion to do this before setting COLUMNS]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-06 09:59:55 -07:00
Junio C Hamano 099a64aa39 Merge branch 'tb/mingw-rmdir-symlink-to-directory'
Windows rmdir() equivalent behaves differently from POSIX ones in
that when used on a symbolic link that points at a directory, the
target directory gets removed, which has been corrected.

* tb/mingw-rmdir-symlink-to-directory:
  mingw: align symlinks-related rmdir() behavior with Linux
2021-08-04 13:28:56 -07:00
Junio C Hamano fea3738ac5 Merge branch 'ab/getcwd-test'
Portability test update.

* ab/getcwd-test:
  t0001: fix broken not-quite getcwd(3) test in bed67874e2
2021-08-04 13:28:55 -07:00
Junio C Hamano 5fef3b15db Merge branch 'pb/merge-autostash-more'
The local changes stashed by "git merge --autostash" were lost when
the merge failed in certain ways, which has been corrected.

* pb/merge-autostash-more:
  merge: apply autostash if merge strategy fails
  merge: apply autostash if fast-forward fails
  Documentation: define 'MERGE_AUTOSTASH'
  merge: add missing word "strategy" to a message
2021-08-04 13:28:54 -07:00
Junio C Hamano 1a6fb019d6 Merge branch 'en/ort-perf-batch-14'
Further optimization on "merge -sort" backend.

* en/ort-perf-batch-14:
  merge-ort: restart merge with cached renames to reduce process entry cost
  merge-ort: avoid recursing into directories when we don't need to
  merge-ort: defer recursing into directories when merge base is matched
  merge-ort: add a handle_deferred_entries() helper function
  merge-ort: add data structures for allowable trivial directory resolves
  merge-ort: add some more explanations in collect_merge_info_callback()
  merge-ort: resolve paths early when we have sufficient information
2021-08-04 13:28:54 -07:00
Junio C Hamano 506d2a354a Merge branch 'ds/commit-and-checkout-with-sparse-index'
"git checkout" and "git commit" learn to work without unnecessarily
expanding sparse indexes.

* ds/commit-and-checkout-with-sparse-index:
  unpack-trees: resolve sparse-directory/file conflicts
  t1092: document bad 'git checkout' behavior
  checkout: stop expanding sparse indexes
  sparse-index: recompute cache-tree
  commit: integrate with sparse-index
  p2000: compress repo names
  p2000: add 'git checkout -' test and decrease depth
2021-08-04 13:28:53 -07:00
Junio C Hamano 10f57e0eb9 Merge branch 'ar/submodule-add'
Rewrite of "git submodule" in C continues.

* ar/submodule-add:
  submodule: drop unused sm_name parameter from show_fetch_remotes()
  submodule--helper: introduce add-clone subcommand
  submodule--helper: refactor module_clone()
  submodule: prefix die messages with 'fatal'
  t7400: test failure to add submodule in tracked path
2021-08-04 13:28:52 -07:00
Thomas Bétous 3e7d4888e5 mingw: align symlinks-related rmdir() behavior with Linux
When performing a rebase, rmdir() is called on the folder .git/logs. On
Unix rmdir() exits without deleting anything in case .git/logs is a
symbolic link but the equivalent functions on Windows (_rmdir, _wrmdir
and RemoveDirectoryW) do not behave the same and remove the folder if it
is symlinked even if it is not empty.

This creates issues when folders in .git/ are symlinks which is
especially the case when git-repo[1] is used: It replaces `.git/logs/`
with a symlink.

One such issue is that the _target_ of that symlink is removed e.g.
during a `git rebase`, where `delete_reflog("REBASE_HEAD")` will not
only try to remove `.git/logs/REBASE_HEAD` but then recursively try to
remove the parent directories until an error occurs, a technique that
obviously relies on `rmdir()` refusing to remove a symlink.

This was reported in https://github.com/git-for-windows/git/issues/2967.

This commit updates mingw_rmdir() so that its behavior is the same as
Linux rmdir() in case of symbolic links.

To verify that Git does not regress on the reported issue, this patch
adds a regression test for the `git rebase` symptom, even if the same
`rmdir()` behavior is quite likely to cause potential problems in other
Git commands as well.

[1]: git-repo is a python tool built on top of Git which helps manage
many Git repositories. It stores all the .git/ folders in a central
place by taking advantage of symbolic links.
More information: https://gerrit.googlesource.com/git-repo/

Signed-off-by: Thomas Bétous <tomspycell@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-02 15:10:58 -07:00
Carlo Marcelo Arenas Belón 4da8b2fcd4 t7508: avoid non POSIX BRE
24c30e0b6 (wt-status: tolerate dangling marks, 2020-09-01) adds a test
that uses a BRE which breaks at least with OpenBSD's grep.

switch to an ERE as it is done for similar checks and while at it, remove
the now obsolete test_i18ngrep call.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-02 15:05:23 -07:00
Junio C Hamano 8230107f33 Merge branch 'jt/bulk-prefetch'
"git read-tree" had a codepath where blobs are fetched one-by-one
from the promisor remote, which has been corrected to fetch in bulk.

* jt/bulk-prefetch:
  cache-tree: prefetch in partial clone read-tree
  unpack-trees: refactor prefetching code
2021-08-02 14:06:42 -07:00
Junio C Hamano 107687b5af Merge branch 'ab/bundle-tests'
"git bundle" gained more test coverage.

* ab/bundle-tests:
  bundle tests: use test_cmp instead of grep
  bundle tests: use ">file" not ": >file"
2021-08-02 14:06:41 -07:00
Junio C Hamano e163f73b7b Merge branch 'ps/perf-with-separate-output-directory'
Test update.

* ps/perf-with-separate-output-directory:
  perf: fix when running with TEST_OUTPUT_DIRECTORY
2021-08-02 14:06:41 -07:00
Ævar Arnfjörð Bjarmason 482e1488a9 t0001: fix broken not-quite getcwd(3) test in bed67874e2
With a54e938e5b (strbuf: support long paths w/o read rights in
strbuf_getcwd() on FreeBSD, 2017-03-26) we had t0001 break on systems
like OpenBSD and AIX whose getcwd(3) has standard (but not like glibc
et al) behavior.

This was partially fixed in bed67874e2 (t0001: skip test with
restrictive permissions if getpwd(3) respects them, 2017-08-07).

The problem with that fix is that while its analysis of the problem is
correct, it doesn't actually call getcwd(3), instead it invokes "pwd
-P". There is no guarantee that "pwd -P" is going to call getcwd(3),
as opposed to e.g. being a shell built-in.

On AIX under both bash and ksh this test breaks because "pwd -P" will
happily display the current working directory, but getcwd(3) called by
the "git init" we're testing here will fail to get it.

I checked whether clobbering the $PWD environment variable would
affect it, and it didn't. Presumably these shells keep track of their
working directory internally.

There's possible follow-up work here in teaching strbuf_getcwd() to
get the working directory with whatever method "pwd" uses on these
platforms. See [1] for a discussion of that, but let's take the easy
way out here and just skip these tests by fixing the
GETCWD_IGNORES_PERMS prerequisite to match the limitations of
strbuf_getcwd().

1. https://lore.kernel.org/git/b650bef5-d739-d98d-e9f1-fa292b6ce982@web.de/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30 10:18:27 -07:00
Junio C Hamano 1d07640b65 Merge branch 'ps/t0000-output-directory-fix'
"TEST_OUTPUT_DIRECTORY=there make test" failed to work, which has
been corrected.

* ps/t0000-output-directory-fix:
  t0000: fix test if run with TEST_OUTPUT_DIRECTORY
2021-07-28 13:18:05 -07:00
Junio C Hamano 7f554a4f69 Merge branch 'tb/reverse-midx'
The code that gives an error message in "git multi-pack-index" when
no subcommand is given tried to print a NULL pointer as a strong,
which has been corrected.

* tb/reverse-midx:
  multi-pack-index: fix potential segfault without sub-command
2021-07-28 13:18:04 -07:00
Junio C Hamano fa8b225d86 Merge branch 'pb/dont-complete-aliased-options'
The completion support used to offer alternate spelling of options
that exist only for compatibility, which has been corrected.

* pb/dont-complete-aliased-options:
  parse-options: don't complete option aliases by default
2021-07-28 13:18:03 -07:00
Junio C Hamano b271a3034f Merge branch 'ds/status-with-sparse-index'
"git status" codepath learned to work with sparsely populated index
without hydrating it fully.

* ds/status-with-sparse-index:
  t1092: document bad sparse-checkout behavior
  fsmonitor: integrate with sparse index
  wt-status: expand added sparse directory entries
  status: use sparse-index throughout
  status: skip sparse-checkout percentage with sparse-index
  diff-lib: handle index diffs with sparse dirs
  dir.c: accept a directory as part of cone-mode patterns
  unpack-trees: unpack sparse directory entries
  unpack-trees: rename unpack_nondirectories()
  unpack-trees: compare sparse directories correctly
  unpack-trees: preserve cache_bottom
  t1092: add tests for status/add and sparse files
  t1092: expand repository data shape
  t1092: replace incorrect 'echo' with 'cat'
  sparse-index: include EXTENDED flag when expanding
  sparse-index: skip indexes with unmerged entries
2021-07-28 13:18:02 -07:00
Junio C Hamano 5bae927222 Merge branch 'ab/pkt-line-tests'
Tests that cover protocol bits have been updated and helpers
used there have been consolidated.

* ab/pkt-line-tests:
  test-lib-functions: use test-tool for [de]packetize()
2021-07-28 13:18:00 -07:00
Junio C Hamano 4d4c8ddd37 Merge branch 'jk/t0000-subtests-fix'
Test fix.

* jk/t0000-subtests-fix:
  t0000: clear GIT_SKIP_TESTS before running sub-tests
2021-07-28 13:18:00 -07:00
Junio C Hamano dd6d3c90ee Merge branch 'ab/attribute-format'
Many "printf"-like helper functions we have have been annotated
with __attribute__() to catch placeholder/parameter mismatches.

* ab/attribute-format:
  advice.h: add missing __attribute__((format)) & fix usage
  *.h: add a few missing __attribute__((format))
  *.c static functions: add missing __attribute__((format))
  sequencer.c: move static function to avoid forward decl
  *.c static functions: don't forward-declare __attribute__
2021-07-28 13:17:59 -07:00
Junio C Hamano c9d6d8a193 Merge branch 'jk/log-decorate-optim'
Optimize "git log" for cases where we wasted cycles to load ref
decoration data that may not be needed.

* jk/log-decorate-optim:
  load_ref_decorations(): fix decoration with tags
  add_ref_decoration(): rename s/type/deco_type/
  load_ref_decorations(): avoid parsing non-tag objects
  object.h: add lookup_object_by_type() function
  object.h: expand docstring for lookup_unknown_object()
  log: avoid loading decorations for userformats that don't need it
  pretty.h: update and expand docstring for userformat_find_requirements()
2021-07-28 13:17:58 -07:00
Junio C Hamano 01369fdfd3 Merge branch 'sm/worktree-add-lock'
"git worktree add --lock" learned to record why the worktree is
locked with a custom message.

* sm/worktree-add-lock:
  worktree: teach `add` to accept --reason <string> with --lock
  worktree: mark lock strings with `_()` for translation
  t2400: clean up '"add" worktree with lock' test
2021-07-28 13:17:58 -07:00
Junio C Hamano e5cc59c77c Merge branch 'ew/many-alternate-optim'
Optimization for repositories with many alternate object store.

* ew/many-alternate-optim:
  oidtree: a crit-bit tree for odb_loose_cache
  oidcpy_with_padding: constify `src' arg
  make object_directory.loose_objects_subdir_seen a bitmap
  avoid strlen via strbuf_addstr in link_alt_odb_entry
  speed up alt_odb_usable() with many alternates
2021-07-28 13:17:57 -07:00
Junio C Hamano 14793a4e37 Merge branch 'hj/commit-allow-empty-message'
"git commit --allow-empty-message" won't abort the operation upon
an empty message, but the hint shown in the editor said otherwise.

* hj/commit-allow-empty-message:
  commit: remove irrelavent prompt on `--allow-empty-message`
  commit: reorganise commit hint strings
2021-07-28 13:17:57 -07:00
Philippe Blain e082631e51 merge: apply autostash if merge strategy fails
Since 'git merge' learned '--autostash' in a03b55530a (merge: teach
--autostash option, 2020-04-07), 'cmd_merge', once it is determined that
we have to create a merge commit, calls 'create_autostash' if
'--autostash' is given.

As explained in a03b55530a, and made more abvious by the tests added in
that commit, the autostash is then applied if the merge succeeds, either
directly or by committing (after conflict resolution or if '--no-commit'
was given), or if the merge is aborted with 'git merge --abort'. In some
other cases, like the user calling 'git reset --merge' or 'git merge
--quit', the autostash is not applied, but saved in the stash list.

However, there exists a scenario that creates an autostash but does not
apply nor save it to the stash list: if the chosen merge strategy
completely fails to handle the merge, i.e. 'try_merge_strategy' returns
2.

Apply the autostash in that case also. An easy way to test that is to
try to merge more than two commits but explicitely ask for the 'recursive'
merge strategy.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-23 15:45:40 -07:00
Philippe Blain 12510bd5da merge: apply autostash if fast-forward fails
Since 'git merge' learned '--autostash' in a03b55530a (merge: teach
--autostash option, 2020-04-07), 'cmd_merge', in the fast-forward case,
calls 'create_autostash' before calling 'checkout_fast_forward' if
'--autostash' is given.

However, if 'checkout_fast_forward' fails, the autostash is not applied
to the working tree, nor saved in the stash list, since the code simply
calls 'goto done'.

Be more helpful to the user by applying the autostash in that case.

An easy way to test a failing fast-forward is when we are merging a
branch that has a tracked file that conflicts with an untracked file in
the working tree.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-23 15:45:38 -07:00
Jonathan Tan d3da223f22 cache-tree: prefetch in partial clone read-tree
"git read-tree" checks the existence of the blobs referenced by the
given tree, but does not bulk prefetch them. Add a bulk prefetch.

The lack of prefetch here was noticed at $DAYJOB during a merge
involving some specific commits, but I couldn't find a minimal merge
that didn't also trigger the prefetch in check_updates() in
unpack-trees.c (and in all these cases, the lack of prefetch in
cache-tree.c didn't matter because all the relevant blobs would have
already been prefetched by then). This is why I used read-tree here to
exercise this code path.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-23 14:22:21 -07:00
Ævar Arnfjörð Bjarmason 63766510a1 bundle tests: use test_cmp instead of grep
Change the bundle tests to fully compare the expected "git ls-remote"
or "git bundle list-heads" output, instead of merely grepping it.

This avoids subtle regressions in the tests. In
f62e0a39b6 (t5704 (bundle): add tests for bundle --stdin, 2010-04-19)
the "bundle --stdin <rev-list options>" test was added to make sure we
didn't include the tag.

But since the --stdin mode didn't work until 5bb0fd2cab (bundle:
arguments can be read from stdin, 2021-01-11) our grepping of
"master" (later "main") missed the important part of the test.

Namely that we should not include the "refs/tags/tag" tag in that
case. Since the test only grepped for "main" in the output we'd miss a
regression in that code.

So let's use test_cmp instead, and also in the other nearby tests
where it's easy.

This does make things a bit more verbose in the case of the test
that's checking the bundle header, since it's different under SHA1 and
SHA256. I think this makes test easier to follow.

I've got some WIP changes to extend the "git bundle" command to dump
parts of the header out, which are easier to understand if we test the
output explicitly like this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-22 13:29:32 -07:00
Ævar Arnfjörð Bjarmason 95cf6464dd bundle tests: use ">file" not ": >file"
Change uses of ":" on the LHS of a ">" to the more commonly used
">file" pattern in t/t5607-clone-bundle.sh.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-22 13:29:30 -07:00
Junio C Hamano fe3fec53a6 Merge branch 'bc/rev-list-without-commit-line'
"git rev-list" learns to omit the "commit <object-name>" header
lines from the output with the `--no-commit-header` option.

* bc/rev-list-without-commit-line:
  rev-list: add option for --pretty=format without header
2021-07-22 13:05:56 -07:00
Junio C Hamano 8de2e2e41b Merge branch 'ab/send-email-optim'
"git send-email" optimization.

* ab/send-email-optim:
  perl: nano-optimize by replacing Cwd::cwd() with Cwd::getcwd()
  send-email: move trivial config handling to Perl
  perl: lazily load some common Git.pm setup code
  send-email: lazily load modules for a big speedup
  send-email: get rid of indirect object syntax
  send-email: use function syntax instead of barewords
  send-email: lazily shell out to "git var"
  send-email: lazily load config for a big speedup
  send-email: copy "config_regxp" into git-send-email.perl
  send-email: refactor sendemail.smtpencryption config parsing
  send-email: remove non-working support for "sendemail.smtpssl"
  send-email tests: test for boolean variables without a value
  send-email tests: support GIT_TEST_PERL_FATAL_WARNINGS=true
2021-07-22 13:05:54 -07:00
Derrick Stolee e05cdb17e8 unpack-trees: resolve sparse-directory/file conflicts
When running unpack_trees() with a sparse index, we attempt to operate
on the index without expanding the sparse directory entries. Thus, we
operate by manipulating entire directories and passing them to the
unpack function. In the case of the 'git checkout' command, this is the
twoway_merge() function.

There are several cases in twoway_merge() that handle different
situations. One new one to add is the case of a directory/file conflict
where the directory is sparse. Before the sparse index, such a conflict
would appear as a list of file additions and deletions. Now,
twoway_merge() initializes 'current', 'oldtree', and 'newtree' from
src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is
equal to the df_conflict_entry. The way to determine that we have a
directory/file conflict is to test that 'current' and 'newtree' disagree
on being sparse directory entries.

When we are in this case, we want to resolve the situation by calling
merged_entry(). This allows replacing the 'current' entry with the
'newtree' entry. This is important for cases where we want to run 'git
checkout' across the conflict and have the new HEAD represent the new
file type at that path. The first NEEDSWORK comment dropped in t1092
demonstrates this necessary behavior.

However, we still are in a confusing state when 'current' corresponds to
a staged change within a sparse directory that is not present at HEAD.
This should be atypical, because it requires adding a change outside of
the sparse-checkout cone, but it is possible. Since we are unable to
determine that this is a staged change within twoway_merge(), we cannot
add a case to reject the merge at this point. I believe this is due to
the use of df_conflict_entry in the place of 'oldtree' instead of using
the valud at HEAD, which would provide some perspective to this
decision. Any change that would allow this differentiation for staged
entries would need to involve information further up in unpack_trees().

That work should be done, sometime, because we are further confusing the
behavior of a directory/file conflict when staging a change in the
directory. The two cases 'checkout behaves oddly with df-conflict-?' in
t1092 demonstrate that even without a sparse-checkout, Git is not
consistent in its behavior. Neither of the two options seems correct,
either. This change makes the sparse-index behave differently than the
typcial sparse-checkout case, but it does match the full checkout
behavior in the df-conflict-2 case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:59:11 -07:00
Derrick Stolee 70569fadce t1092: document bad 'git checkout' behavior
Add new branches to the test repo that demonstrate directory/file
conflicts in different ways. Since the directory 'folder1/' has
adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes
searches for 'folder1/' to land in a different place in the index than a
search for 'folder1'. This causes a change in behavior when working with
the df-conflict-1 and df-conflict-2 branches, whose only difference is
that the first uses 'folder1' as the conflict and the other uses
'folder2' which does not have these adjacent files.

We can extend two tests that compare the behavior across different 'git
checkout' commands, and we see already that the behavior will be
different in some cases and not in others. The difference between the
two test loops is that one uses 'git reset --hard' between iterations.

Further, we isolate the behavior of creating a staged change within a
directory and then checking out a branch where that directory is
replaced with a file. A full checkout behaves differently across these
two cases, while a sparse-checkout cone behaves consistently. In both
cases, the behavior is wrong. In one case, the staged change is dropped
entirely. The other case the staged change is kept, replacing the file
at that location, but none of the other files in the directory are kept.

Likely, the correct behavior in this case is to reject the checkout and
report the conflict, leaving HEAD in its previous location. None of the
cases behave this way currently. Use comments to demonstrate that the
tested behavior is only a documentation of the current, incorrect
behavior to ensure we do not _accidentally_ change it. Instead, we would
prefer to change it on purpose with a future change.

At this point, the sparse-index does not handle these 'git checkout'
commands correctly. Or rather, it _does_ reject the 'git checkout' when
we have the staged change, but for the wrong reason. It also rejects the
'git checkout' commands when there is no staged change and we want to
replace a directory with a file. A fix for that unstaged case will
follow in the next change, but that will make the sparse-index agree
with the full checkout case in these documented incorrect behaviors.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:59:11 -07:00
Elijah Newren 8b09a900a1 merge-ort: restart merge with cached renames to reduce process entry cost
The merge algorithm mostly consists of the following three functions:
   collect_merge_info()
   detect_and_process_renames()
   process_entries()
Prior to the trivial directory resolution optimization of the last half
dozen commits, process_entries() was consistently the slowest, followed
by collect_merge_info(), then detect_and_process_renames().  When the
trivial directory resolution applies, it often dramatically decreases
the amount of time spent in the two slower functions.

Looking at the performance results in the previous commit, the trivial
directory resolution optimization helps amazingly well when there are no
relevant renames.  It also helps really well when reapplying a long
series of linear commits (such as in a rebase or cherry-pick), since the
relevant renames may well be cached from the first reapplied commit.
But when there are any relevant renames that are not cached (represented
by the just-one-mega testcase), then the optimization does not help at
all.

Often, I noticed that when the optimization does not apply, it is
because there are a handful of relevant sources -- maybe even only one.
It felt frustrating to need to recurse into potentially hundreds or even
thousands of directories just for a single rename, but it was needed for
correctness.

However, staring at this list of functions and noticing that
process_entries() is the most expensive and knowing I could avoid it if
I had cached renames suggested a simple idea: change
   collect_merge_info()
   detect_and_process_renames()
   process_entries()
into
   collect_merge_info()
   detect_and_process_renames()
   <cache all the renames, and restart>
   collect_merge_info()
   detect_and_process_renames()
   process_entries()

This may seem odd and look like more work.  However, note that although
we run collect_merge_info() twice, the second time we get to employ
trivial directory resolves, which makes it much faster, so the increased
time in collect_merge_info() is small.  While we run
detect_and_process_renames() again, all renames are cached so it's
nearly a no-op (we don't call into diffcore_rename_extended() but we do
have a little bit of data structure checking and fixing up).  And the
big payoff comes from the fact that process_entries(), will be much
faster due to having far fewer entries to process.

This restarting only makes sense if we can save recursing into enough
directories to make it worth our while.  Introduce a simple heuristic to
guide this.  Note that this heuristic uses a "wanted_factor" that I have
virtually no actual real world data for, just some back-of-the-envelope
quasi-scientific calculations that I included in some comments and then
plucked a simple round number out of thin air.  It could be that
tweaking this number to make it either higher or lower improves the
optimization.  (There's slightly more here; when I first introduced this
optimization, I used a factor of 10, because I was completely confident
it was big enough to not cause slowdowns in special cases.  I was
certain it was higher than needed.  Several months later, I added the
rough calculations which make me think the optimal number is close to 2;
but instead of pushing to the limit, I just bumped it to 3 to reduce the
risk that there are special cases where this optimization can result in
slowing down the code a little.  If the ratio of path counts is below 3,
we probably will only see minor performance improvements at best
anyway.)

Also, note that while the diffstat looks kind of long (nearly 100
lines), more than half of it is in two comments explaining how things
work.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      205.1  ms ±  3.8  ms   204.2  ms ±  3.0  ms
    mega-renames:      1.564 s ±  0.010 s     1.076 s ±  0.015 s
    just-one-mega:   479.5  ms ±  3.9  ms   364.1  ms ±  7.0  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:47:40 -07:00
Patrick Steinhardt ade1552598 t0000: fix test if run with TEST_OUTPUT_DIRECTORY
Testcases in t0000 are quite special given that they many of them run
nested testcases to verify that testing functionality itself works as
expected. These nested testcases are realized by writing a new ad-hoc
test script which again sources test-lib.sh, where the new script is
created in a nested subdirectory located beneath the current trash
directory. We then execute the new test script with the nested
subdirectory as current working directory and explicitly re-export
TEST_OUTPUT_DIRECTORY to point to that directory.

While this works as expected in the general case, it falls apart when
the developer has TEST_OUTPUT_DIRECTORY explicitly defined either via
the environment or via config.mak and runs "make test". In that case,
test-lib.sh will clobber the value that we've just carefully set up to
instead contain what the developer has defined. As a result, the
TEST_OUTPUT_DIRECTORY continues to point at the root output directory,
not at the nested one.

This issue causes breakage in the 'test_atexit is run' test case: the
nested test case writes files into "../../", which is assumed to be the
parent's trash directory. But because TEST_OUTPUT_DIRECTORY already
points to to the root output directory, we instead end up writing those
files outside of the output directory. The parent test case will then
try to check whether those files still exist in its own trash directory,
which thus must fail now.

Fix the issue by adding a new TEST_OUTPUT_DIRECTORY_OVERRIDE variable.
If set, then we'll always override the TEST_OUTPUT_DIRECTORY with its
value after sourcing GIT-BUILD-OPTIONS.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 09:19:02 -07:00
Taylor Blau 88617d11f9 multi-pack-index: fix potential segfault without sub-command
Since cd57bc41bb (builtin/multi-pack-index.c: display usage on
unrecognized command, 2021-03-30) we have used a "usage" label to avoid
having two separate callers of usage_with_options (one when no arguments
are given, and another for unrecognized sub-commands).

But the first caller has been broken since cd57bc41bb, since it will
happily jump to usage without arguments, and then pass argv[0] to the
"unrecognized subcommand" error.

Many compilers will save us from a segfault here, but the end result is
ugly, since it mentions an unrecognized subcommand when we didn't even
pass one, and (on GCC) includes "(null)" in its output.

Move the "usage" label down past the error about unrecognized
subcommands so that it is only triggered when it should be. While we're
at it, bulk up our test coverage in this area, too.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-19 15:24:01 -07:00
Jeff King ac223c4047 t0000: clear GIT_SKIP_TESTS before running sub-tests
In t0000, we run several fake "sub-test" suites to verify the behavior
of the test suite. But because we don't clear the parent environment
completely, the sub-tests can be fooled by variables meant for the
parent. For example:

  GIT_SKIP_TESTS=t1234 ./t0000-basic.sh

fails when a sub-test expects its fake t1234 to actually run. This
particular pattern is unlikely in practice; we're running a single
script, and there is no t1234 in the real test suite anyway (not yet, at
least). A more real-world example is:

  GIT_SKIP_TESTS=t[^0]* make test

to run only the t0* tests.

The fix is conceptually simple: we should clear the GIT_SKIP_TESTS
variable when running the sub-tests, because its contents (if any) will
be meant for the main test suite. This is easy to do centrally in our
sub-test helper.

But there's a catch: some of our tests do set GIT_SKIP_TESTS
intentionally to test the feature. We need to allow them to continue to
set it, but clear it for all the other tests. And the sub-test helper
can't tell if the GIT_SKIP_TESTS it sees is from a test or not. We can
handle this by adding a new option to the helper to let callers specify
the skip list.

I considered adding a more general "--eval" option to let callers set up
the env for the sub-test however they like. That would cover this case
and possible future ones. But the quoting gets awkward for the callers
(since we're now 2 layers deep in evals!), so I went with the simpler
more specific solution.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-19 13:26:00 -07:00
Ævar Arnfjörð Bjarmason 64f0109f17 test-lib-functions: use test-tool for [de]packetize()
The shell+perl "[de]packetize()" helper functions were added in
4414a15002 (t/lib-git-daemon: add network-protocol helpers,
2018-01-24), and around the same time we added the "pkt-line" helper
in 74e7002961 (test-pkt-line: introduce a packet-line test helper,
2018-03-14).

For some reason it seems we've mostly used the shell+perl version
instead of the helper since then. There were discussions around
88124ab263 (test-lib-functions: make packetize() more efficient,
2020-03-27) and cacae4329f (test-lib-functions: simplify packetize()
stdin code, 2020-03-29) to improve them and make them more efficient.

There was one good reason to do so, we needed an equivalent of
"test-tool pkt-line pack", but that command wasn't capable of handling
input with "\n" (a feature) or "\0" (just because it happens to be
printf-based under the hood).

Let's add a "pkt-line-raw" helper for that, and expose is at a
packetize_raw() to go with the existing packetize() on the shell
level, this gives us the smallest amount of change to the tests
themselves.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-19 11:53:50 -07:00
Junio C Hamano 8e62a85352 Merge branch 'ds/gender-neutral-doc'
Update the documentation not to assume users are of certain gender
and adds to guidelines to do so.

* ds/gender-neutral-doc:
  *: fix typos
  comments: avoid using the gender of our users
  doc: avoid using the gender of other people
2021-07-16 17:42:53 -07:00
Junio C Hamano 8721e2eaed Merge branch 'jt/partial-clone-submodule-1'
Prepare the internals for lazily fetching objects in submodules
from their promisor remotes.

* jt/partial-clone-submodule-1:
  promisor-remote: teach lazy-fetch in any repo
  run-command: refactor subprocess env preparation
  submodule: refrain from filtering GIT_CONFIG_COUNT
  promisor-remote: support per-repository config
  repository: move global r_f_p_c to repo struct
2021-07-16 17:42:53 -07:00
Junio C Hamano 832a239b72 Merge branch 'dd/test-stdout-count-lines'
Tiny test clean-up.

* dd/test-stdout-count-lines:
  t6402: preserve git exit status code
  t6400: preserve git ls-files exit status code
  test-lib-functions: introduce test_stdout_line_count
2021-07-16 17:42:52 -07:00
Junio C Hamano c4670b8a8d Merge branch 'hn/refs-test-cleanup'
Test clean-up.

* hn/refs-test-cleanup:
  t7509: avoid direct file access for writing CHERRY_PICK_HEAD
  t1415: avoid direct filesystem access for writing refs
2021-07-16 17:42:52 -07:00
Junio C Hamano 3cc43bff9c Merge branch 'ab/mktag-tests'
Fill test gaps.

* ab/mktag-tests:
  mktag tests: test fast-export
  mktag tests: test for-each-ref
  mktag tests: test update-ref and reachable fsck
  mktag tests: test hash-object --literally and unreachable fsck
  mktag tests: invert --no-strict test
  mktag tests: parse out options in helper
2021-07-16 17:42:48 -07:00
Junio C Hamano 1fb3445658 Merge branch 'ab/show-branch-tests'
Fill test gaps.

* ab/show-branch-tests:
  show-branch tests: add missing tests
  show-branch: don't <COLOR></RESET> for space characters
  show-branch tests: modernize test code
  show-branch tests: rename the one "show-branch" test file
2021-07-16 17:42:48 -07:00
Junio C Hamano b2fc822629 Merge branch 'ab/fetch-negotiate-segv-fix'
Code recently added to support common ancestry negotiation during
"git push" did not sanity check its arguments carefully enough.

* ab/fetch-negotiate-segv-fix:
  fetch: fix segfault in --negotiate-only without --negotiation-tip=*
  fetch: document the --negotiate-only option
  send-pack.c: move "no refs in common" abort earlier
2021-07-16 17:42:48 -07:00
Junio C Hamano 3b57e72c0c Merge branch 'tb/midx-use-checksum'
When rebuilding the multi-pack index file reusing an existing one,
we used to blindly trust the existing file and ended up carrying
corrupted data into the updated file, which has been corrected.

* tb/midx-use-checksum:
  midx: report checksum mismatches during 'verify'
  midx: don't reuse corrupt MIDXs when writing
  commit-graph: rewrite to use checksum_valid()
  csum-file: introduce checksum_valid()
2021-07-16 17:42:46 -07:00