Commit graph

18583 commits

Author SHA1 Message Date
Junio C Hamano
a86ed75f32 Merge branch 'rs/make-verify-path-really-verify-again'
Recent sparse-index work broke safety against attempts to add paths
with trailing slashes to the index, which has been corrected.

* rs/make-verify-path-really-verify-again:
  read-cache: let verify_path() reject trailing dir separators again
  read-cache: add verify_path_internal()
  t3905: show failure to ignore sub-repo
2021-10-18 15:47:58 -07:00
Junio C Hamano
092228ee5c Merge branch 'jk/cat-file-batch-all-wo-replace'
"git cat-file --batch" with the "--batch-all-objects" option is
supposed to iterate over all the objects found in a repository, but
it used to translate these object names using the replace mechanism,
which defeats the point of enumerating all objects in the repository.
This has been corrected.

* jk/cat-file-batch-all-wo-replace:
  cat-file: use packed_object_info() for --batch-all-objects
  cat-file: split ordered/unordered batch-all-objects callbacks
  cat-file: disable refs/replace with --batch-all-objects
  cat-file: mention --unordered along with --batch-all-objects
  t1006: clean up broken objects
2021-10-18 15:47:57 -07:00
Junio C Hamano
0b69bb0fb1 Merge branch 'tb/repack-write-midx'
"git repack" has been taught to generate multi-pack reachability
bitmaps.

* tb/repack-write-midx:
  test-read-midx: fix leak of bitmap_index struct
  builtin/repack.c: pass `--refs-snapshot` when writing bitmaps
  builtin/repack.c: make largest pack preferred
  builtin/repack.c: support writing a MIDX while repacking
  builtin/repack.c: extract showing progress to a variable
  builtin/repack.c: rename variables that deal with non-kept packs
  builtin/repack.c: keep track of existing packs unconditionally
  midx: preliminary support for `--refs-snapshot`
  builtin/multi-pack-index.c: support `--stdin-packs` mode
  midx: expose `write_midx_file_only()` publicly
2021-10-18 15:47:57 -07:00
Junio C Hamano
223a1bfb58 Merge branch 'js/retire-preserve-merges'
The "--preserve-merges" option of "git rebase" has been removed.

* js/retire-preserve-merges:
  sequencer: restrict scope of a formerly public function
  rebase: remove a no-longer-used function
  rebase: stop mentioning the -p option in comments
  rebase: remove obsolete code comment
  rebase: drop the internal `rebase--interactive` command
  git-svn: drop support for `--preserve-merges`
  rebase: drop support for `--preserve-merges`
  pull: remove support for `--rebase=preserve`
  tests: stop testing `git rebase --preserve-merges`
  remote: warn about unhandled branch.<name>.rebase values
  t5520: do not use `pull.rebase=preserve`
2021-10-18 15:47:56 -07:00
Junio C Hamano
0ef08090d2 Merge branch 'rs/mergesort'
The mergesort implementation used to sort linked list has been
optimized.

* rs/mergesort:
  test-mergesort: use repeatable random numbers
  mergesort: use ranks stack
  p0071: test performance of llist_mergesort()
  p0071: measure sorting of already sorted and reversed files
  test-mergesort: add unriffle_skewed mode
  test-mergesort: add unriffle mode
  test-mergesort: add generate subcommand
  test-mergesort: add test subcommand
  test-mergesort: add sort subcommand
  test-mergesort: use strbuf_getline()
2021-10-18 15:47:56 -07:00
Junio C Hamano
d7bc852151 Merge branch 'ab/align-parse-options-help'
When "git cmd -h" shows more than one line of usage text (e.g.
the cmd subcommand may take sub-sub-command), parse-options API
learned to align these lines, even across i18n/l10n.

* ab/align-parse-options-help:
  parse-options: properly align continued usage output
  git rev-parse --parseopt tests: add more usagestr tests
  send-pack: properly use parse_options() API for usage string
  parse-options API users: align usage output in C-strings
2021-10-13 15:15:58 -07:00
Junio C Hamano
62f035aee3 Merge branch 'ab/help-config-vars'
Teach "git help -c" into helping the command line completion of
configuration variables.

* ab/help-config-vars:
  help: move column config discovery to help.c library
  help / completion: make "git help" do the hard work
  help tests: test --config-for-completion option & output
  help: simplify by moving to OPT_CMDMODE()
  help: correct logic error in combining --all and --guides
  help: correct logic error in combining --all and --config
  help tests: add test for --config output
  help: correct usage & behavior of "git help --guides"
  help: correct the usage string in -h and documentation
2021-10-13 15:15:58 -07:00
Junio C Hamano
af303ee392 Merge branch 'jh/builtin-fsmonitor-part1'
Built-in fsmonitor (part 1).

* jh/builtin-fsmonitor-part1:
  t/helper/simple-ipc: convert test-simple-ipc to use start_bg_command
  run-command: create start_bg_command
  simple-ipc/ipc-win32: add Windows ACL to named pipe
  simple-ipc/ipc-win32: add trace2 debugging
  simple-ipc: move definition of ipc_active_state outside of ifdef
  simple-ipc: preparations for supporting binary messages.
  trace2: add trace2_child_ready() to report on background children
2021-10-13 15:15:58 -07:00
Junio C Hamano
cf006037bf Merge branch 'ab/lib-subtest'
Updates to the tests in t0000 to test the test framework.

* ab/lib-subtest:
  test-lib tests: get rid of copy/pasted mock test code
  test-lib tests: assert 1 exit code, not non-zero
  test-lib tests: refactor common part of check_sub_test_lib_test*()
  test-lib tests: avoid subshell for "test_cmp" for readability
  test-lib tests: don't provide a description for the sub-tests
  test-lib tests: split up "write and run" into two functions
  test-lib tests: move "run_sub_test" to a new lib-subtest.sh
2021-10-13 15:15:57 -07:00
Junio C Hamano
a7c2daa06d Merge branch 'en/removing-untracked-fixes'
Various fixes in code paths that move untracked files away to make room.

* en/removing-untracked-fixes:
  Documentation: call out commands that nuke untracked files/directories
  Comment important codepaths regarding nuking untracked files/dirs
  unpack-trees: avoid nuking untracked dir in way of locally deleted file
  unpack-trees: avoid nuking untracked dir in way of unmerged file
  Change unpack_trees' 'reset' flag into an enum
  Remove ignored files by default when they are in the way
  unpack-trees: make dir an internal-only struct
  unpack-trees: introduce preserve_ignored to unpack_trees_options
  read-tree, merge-recursive: overwrite ignored files by default
  checkout, read-tree: fix leak of unpack_trees_options.dir
  t2500: add various tests for nuking untracked files
2021-10-13 15:15:57 -07:00
Junio C Hamano
1fdfb774aa Merge branch 'mt/grep-submodule-textconv'
"git grep --recurse-submodules" takes trees and blobs from the
submodule repository, but the textconv settings when processing a
blob from the submodule is not taken from the submodule repository.
A test is added to demonstrate the issue, without fixing it.

* mt/grep-submodule-textconv:
  grep: demonstrate bug with textconv attributes and submodules
2021-10-13 15:15:56 -07:00
Junio C Hamano
2d498a7c89 Merge branch 'ds/add-rm-with-sparse-index'
"git add", "git mv", and "git rm" have been adjusted to avoid
updating paths outside of the sparse-checkout definition unless
the user specifies a "--sparse" option.

* ds/add-rm-with-sparse-index:
  advice: update message to suggest '--sparse'
  mv: refuse to move sparse paths
  rm: skip sparse paths with missing SKIP_WORKTREE
  rm: add --sparse option
  add: update --renormalize to skip sparse paths
  add: update --chmod to skip sparse paths
  add: implement the --sparse option
  add: skip tracked paths outside sparse-checkout cone
  add: fail when adding an untracked sparse file
  dir: fix pattern matching on dirs
  dir: select directories correctly
  t1092: behavior for adding sparse files
  t3705: test that 'sparse_entry' is unstaged
2021-10-13 15:15:56 -07:00
Junio C Hamano
68ef6c0b1a Merge branch 'tb/aggregate-ignore-leading-whitespaces'
Test portability update.

* tb/aggregate-ignore-leading-whitespaces:
  t/perf/aggregate.perl: tolerate leading spaces
2021-10-11 10:21:49 -07:00
Junio C Hamano
252caf8e41 Merge branch 'rs/p3400-lose-tac'
Test portability update.

* rs/p3400-lose-tac:
  p3400: stop using tac(1)
2021-10-11 10:21:49 -07:00
Junio C Hamano
0cc4ec1550 Merge branch 'da/difftool'
Code clean-up in "git difftool".

* da/difftool:
  difftool: add a missing space to the run_dir_diff() comments
  difftool: remove an unnecessary call to strbuf_release()
  difftool: refactor dir-diff to write files using helper functions
  difftool: create a tmpdir path without repeated slashes
2021-10-11 10:21:48 -07:00
Junio C Hamano
404c4a5462 Merge branch 'ab/designated-initializers'
Code clean-up.

* ab/designated-initializers:
  cbtree.h: define cb_init() in terms of CBTREE_INIT
  *.h: move some *_INIT to designated initializers
  *.h _INIT macros: don't specify fields equal to 0
  *.[ch] *_INIT macros: use { 0 } for a "zero out" idiom
  submodule-config.h: remove unused SUBMODULE_INIT macro
2021-10-11 10:21:48 -07:00
Junio C Hamano
859a585bdf Merge branch 'ab/sanitize-leak-ci'
CI learns to run the leak sanitizer builds.

* ab/sanitize-leak-ci:
  tests: add a test mode for SANITIZE=leak, run it in CI
  Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS
2021-10-11 10:21:47 -07:00
Junio C Hamano
f6c075ad71 Merge branch 'jk/ref-paranoia'
The ref iteration code used to optionally allow dangling refs to be
shown, which has been tightened up.

* jk/ref-paranoia:
  refs: drop "broken" flag from for_each_fullref_in()
  ref-filter: drop broken-ref code entirely
  ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
  repack, prune: drop GIT_REF_PARANOIA settings
  refs: turn on GIT_REF_PARANOIA by default
  refs: omit dangling symrefs when using GIT_REF_PARANOIA
  refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
  refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
  refs-internal.h: move DO_FOR_EACH_* flags next to each other
  t5312: be more assertive about command failure
  t5312: test non-destructive repack
  t5312: create bogus ref as necessary
  t5312: drop "verbose" helper
  t5600: provide detached HEAD for corruption failures
  t5516: don't use HEAD ref for invalid ref-deletion tests
  t7900: clean up some more broken refs
2021-10-11 10:21:47 -07:00
Junio C Hamano
ed4d535342 Merge branch 'sg/test-split-index-fix'
Test updates.

* sg/test-split-index-fix:
  read-cache: fix GIT_TEST_SPLIT_INDEX
  tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
  read-cache: look for shared index files next to the index, too
  t1600-index: disable GIT_TEST_SPLIT_INDEX
  t1600-index: don't run git commands upstream of a pipe
  t1600-index: remove unnecessary redirection
2021-10-11 10:21:47 -07:00
Junio C Hamano
9567a670d2 Merge branch 'tb/midx-write-propagate-namehash'
"git multi-pack-index write --bitmap" learns to propagate the
hashcache from original bitmap to resulting bitmap.

* tb/midx-write-propagate-namehash:
  t5326: test propagating hashcache values
  p5326: generate pack bitmaps before writing the MIDX bitmap
  p5326: don't set core.multiPackIndex unnecessarily
  p5326: create missing 'perf-tag' tag
  midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
  pack-bitmap.c: propagate namehash values from existing bitmaps
  t/helper/test-bitmap.c: add 'dump-hashes' mode
2021-10-11 10:21:46 -07:00
Jeff King
5c5b29b459 cat-file: disable refs/replace with --batch-all-objects
When we're enumerating all objects in the object database, it doesn't
make sense to respect refs/replace. The point of this option is to
enumerate all of the objects in the database at a low level. By
definition we'd already show the replacement object's contents (under
its real oid), and showing those contents under another oid is almost
certainly working against what the user is trying to do.

Note that you could make the same argument for something like:

  git show-index <foo.idx |
  awk '{print $2}' |
  git cat-file --batch

but there we can't know in cat-file exactly what the user intended,
because we don't know the source of the input. They could be trying to
do low-level debugging, or they could be doing something more high-level
(e.g., imagine a porcelain built around cat-file for its object
accesses). So in those cases, we'll have to rely on the user specifying
"git --no-replace-objects" to tell us what to do.

One _could_ make an argument that "cat-file --batch" is sufficiently
low-level plumbing that it should not respect replace-objects at all
(and the caller should do any replacement if they want it).  But we have
been doing so for some time. The history is a little tangled:

  - looking back as far as v1.6.6, we would not respect replace refs for
    --batch-check, but would for --batch (because the former used
    sha1_object_info(), and the replace mechanism only affected actual
    object reads)

  - this discrepancy was made even weirder by 98e2092b50 (cat-file:
    teach --batch to stream blob objects, 2013-07-10), where we always
    output the header using the --batch-check code, and then printed the
    object separately. This could lead to "cat-file --batch" dying (when
    it notices the size or type changed for a non-blob object) or even
    producing bogus output (in streaming mode, we didn't notice that we
    wrote the wrong number of bytes).

  - that persisted until 1f7117ef7a (sha1_file: perform object
    replacement in sha1_object_info_extended(), 2013-12-11), which then
    respected replace refs for both forms.

So it has worked reliably this way for over 7 years, and we should make
sure it continues to do so. That could also be an argument that
--batch-all-objects should not change behavior (which this patch is
doing), but I really consider the current behavior to be an unintended
bug. It's a side effect of how the code is implemented (feeding the oids
back into oid_object_info() rather than looking at what we found while
reading the loose and packed object storage).

The implementation is straight-forward: we just disable the global
read_replace_refs flag when we're in --batch-all-objects mode. It would
perhaps be a little cleaner to change the flag we pass to
oid_object_info_extended(), but that's not enough. We also read objects
via read_object_file() and stream_blob_to_fd(). The former could switch
to its _extended() form, but the streaming code has no mechanism for
disabling replace refs. Setting the global flag works, and as a bonus,
it's impossible to have any "oops, we're sometimes replacing the object
and sometimes not" bugs in the output (like the ones caused by
98e2092b50 above).

The tests here cover the regular-input and --batch-all-objects cases,
for both --batch-check and --batch. There is a test in t6050 that covers
the regular-input case with --batch already, but this new one goes much
further in actually verifying the output (plus covering --batch-check
explicitly). This is perhaps a little overkill and the tests would be
simpler just covering --batch-check, but I wanted to make sure we're
checking that --batch output is consistent between the header and the
content. The global-flag technique used here makes that easy to get
right, but this is future-proofing us against regressions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:45:14 -07:00
Jeff King
e879295b20 t1006: clean up broken objects
A few of the tests create intentionally broken objects with broken
types. Let's clean them up after we're done with them, so that later
tests don't get confused (we hadn't noticed because this only affects
tests which use --batch-all-objects, but I'm about to add more).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:45:14 -07:00
René Scharfe
c90cfc225b test-mergesort: use repeatable random numbers
Use MINSTD to generate pseudo-random numbers consistently instead of
using rand(3), whose output can vary from system to system, and reset
its seed before filling in the test values.  This gives repeatable
results across versions and systems, which simplifies sharing and
comparing of results between developers.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:04:56 -07:00
René Scharfe
c8ad9d04c6 read-cache: let verify_path() reject trailing dir separators again
6e773527b6 (sparse-index: convert from full to sparse, 2021-03-30) made
verify_path() accept trailing directory separators for directories,
which is necessary for sparse directory entries.  This clemency causes
"git stash" to stumble over sub-repositories, though, and there may be
more unintended side-effects.

Avoid them by restoring the old verify_path() behavior and accepting
trailing directory separators only in places that are supposed to handle
sparse directory entries.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 17:52:26 -07:00
René Scharfe
fc5e90b848 t3905: show failure to ignore sub-repo
"git stash" used to ignore sub-repositories until 6e773527b6
(sparse-index: convert from full to sparse, 2021-03-30).  Add a test
that demonstrates this regression.

Reported-by: Robert Leftwich <robert@gitpod.io>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 17:47:39 -07:00
Jeff King
e861b09636 test-read-midx: fix leak of bitmap_index struct
In read_midx_preferred_pack(), we open the bitmap index but never free
it. This isn't a big deal since this is just a test helper, and we exit
immediately after, but since we're trying to keep our leak-checking tidy
now, it's worth fixing.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 11:01:22 -07:00
Junio C Hamano
16119bac40 Merge branch 'lh/systemd-timers'
Testfix.

* lh/systemd-timers:
  maintenance: fix test t7900-maintenance.sh
2021-10-06 13:40:13 -07:00
Junio C Hamano
d8d33378ed Merge branch 'ab/repo-settings-cleanup'
Code cleanup.

* ab/repo-settings-cleanup:
  repository.h: don't use a mix of int and bitfields
  repo-settings.c: simplify the setup
  read-cache & fetch-negotiator: check "enum" values in switch()
  environment.c: remove test-specific "ignore_untracked..." variable
  wrapper.c: add x{un,}setenv(), and use xsetenv() in environment.c
2021-10-06 13:40:11 -07:00
Junio C Hamano
844cc43377 Merge branch 'tb/commit-graph-usage-fix'
Regression in "git commit-graph" command line parsing has been
corrected.

* tb/commit-graph-usage-fix:
  builtin/multi-pack-index.c: disable top-level --[no-]progress
  builtin/commit-graph.c: don't accept common --[no-]progress
2021-10-06 13:40:11 -07:00
Junio C Hamano
7cebe73dbd Merge branch 'pw/rebase-of-a-tag-fix'
"git rebase <upstream> <tag>" failed when aborted in the middle, as
it mistakenly tried to write the tag object instead of peeling it
to HEAD.

* pw/rebase-of-a-tag-fix:
  rebase: dereference tags
  rebase: use lookup_commit_reference_by_name()
  rebase: use our standard error return value
  t3407: rework rebase --quit tests
  t3407: strengthen rebase --abort tests
  t3407: use test_path_is_missing
  t3407: rename a variable
  t3407: use test_cmp_rev
  t3407: use test_commit
  t3407: run tests in $TEST_DIRECTORY
2021-10-06 13:40:11 -07:00
Junio C Hamano
921c795c25 Merge branch 'jt/add-submodule-odb-clean-up'
More code paths that use the hack to add submodule's object
database to the set of alternate object store have been cleaned up.

* jt/add-submodule-odb-clean-up:
  revision: remove "submodule" from opt struct
  repository: support unabsorbed in repo_submodule_init
  submodule: remove unnecessary unabsorbed fallback
2021-10-06 13:40:11 -07:00
Taylor Blau
76f3b69896 t/perf/aggregate.perl: tolerate leading spaces
When using `test_size` with `wc -c`, users on certain platforms can run
into issues when `wc` emits leading space characters in its output,
which confuses get_times.

Callers could switch to use test_file_size instead of `wc -c` (the
former never prints leading space characters, so will always work with
test_size regardless of platform), but this is an easy enough spot to
miss that we should teach get_times to be more tolerant of the input it
accepts.

Teach get_times to do just that by stripping any leading space
characters.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-04 14:12:28 -07:00
René Scharfe
100c2da2d3 p3400: stop using tac(1)
b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29)
added a perf test that calls tac(1) from GNU core utilities.  Support
systems without it by reversing the generated list using sort -nr
instead.  sort(1) with options -n and -r is already used in other tests.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-03 22:07:21 -07:00
Junio C Hamano
3a757d0369 Merge branch 'ah/connect-parse-feature-v0-fix'
Protocol v0 clients can get stuck parsing a malformed feature line.

* ah/connect-parse-feature-v0-fix:
  connect: also update offset for features without values
2021-10-03 21:49:21 -07:00
Junio C Hamano
cbb1ae05d5 Merge branch 'ds/perf-test-built-path-fix'
Perf test fix.

* ds/perf-test-built-path-fix:
  t/perf/run: fix bin-wrappers computation
2021-10-03 21:49:19 -07:00
Junio C Hamano
58e2bc452b Merge branch 'jk/http-redact-fix'
Sensitive data in the HTTP trace were supposed to be redacted, but
we failed to do so in HTTP/2 requests.

* jk/http-redact-fix:
  http: match headers case-insensitively when redacting
2021-10-03 21:49:19 -07:00
Junio C Hamano
6a4f5dadd3 Merge branch 'da/difftool-dir-diff-symlink-fix'
"git difftool --dir-diff" mishandled symbolic links.

* da/difftool-dir-diff-symlink-fix:
  difftool: fix symlink-file writing in dir-diff mode
2021-10-03 21:49:19 -07:00
Junio C Hamano
1030daecda Merge branch 'cb/cvsserver'
"git cvsserver" had a long-standing bug in its authentication code,
which has finally been corrected (it is unclear and is a separate
question if anybody is seriously using it, though).

* cb/cvsserver:
  Documentation: cleanup git-cvsserver
  git-cvsserver: protect against NULL in crypt(3)
  git-cvsserver: use crypt correctly to compare password hashes
2021-10-03 21:49:17 -07:00
Junio C Hamano
ac162a606b Merge branch 'jk/clone-unborn-head-in-bare'
"git clone" from a repository whose HEAD is unborn into a bare
repository didn't follow the branch name the other side used, which
is corrected.

* jk/clone-unborn-head-in-bare:
  clone: handle unborn branch in bare repos
2021-10-03 21:49:17 -07:00
Junio C Hamano
4a6fd7d3c7 Merge branch 'en/stash-df-fix'
"git stash", where the tentative change involves changing a
directory to a file (or vice versa), was confused, which has been
corrected.

* en/stash-df-fix:
  stash: restore untracked files AFTER restoring tracked files
  stash: avoid feeding directories to update-index
  t3903: document a pair of directory/file bugs
2021-10-03 21:49:16 -07:00
Taylor Blau
324efc90d1 builtin/repack.c: pass --refs-snapshot when writing bitmaps
To prevent the race described in an earlier patch, generate and pass a
reference snapshot to the multi-pack bitmap code, if we are writing one
from `git repack`.

This patch is mostly limited to creating a temporary file, and then
calling for_each_ref(). Except we try to minimize duplicates, since
doing so can drastically reduce the size in network-of-forks style
repositories. In the kernel's fork network (the repository containing
all objects from the kernel and all its forks), deduplicating the
references drops the snapshot size from 934 MB to just 12 MB.

But since we're handling duplicates in this way, we have to make sure
that we preferred references (those listed in pack.preferBitmapTips)
before non-preferred ones (to avoid recording an object which is pointed
at by a preferred tip as non-preferred).

We accomplish this by doing separate passes over the references: first
visiting each prefix in pack.preferBitmapTips, and then over the rest of
the references.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 16:40:09 -07:00
René Scharfe
40bc872adb p0071: test performance of llist_mergesort()
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:09 -07:00
René Scharfe
84edc40676 p0071: measure sorting of already sorted and reversed files
Check if sorting takes advantage of already sorted or reversed content,
or if that corner case actually decreases performance, like it would for
a simplistic quicksort implementation.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:09 -07:00
René Scharfe
f1ed4ce9e3 test-mergesort: add unriffle_skewed mode
Add a mode that turns a sorted list into adversarial input for a
bottom-up mergesort implementation that doubles the length of sorted
sublists at each level -- like our llist_mergesort().

While unriffle mode splits the list in half at each recursion step,
unriffle_skewed splits it into 2^l items and the rest, with 2^l being
the highest power of two smaller than the number of items and thus
2^l >= rest.  The rest is unriffled with the tail of the first half to
require a merge to compare the maximum number of elements.

It complements the unriffle mode, which targets balanced merges.  If
the number of elements is a power of two then both actually produce the
same result, as 2^l == rest == n/2 at each recursion step in that case.

Here are the results:

   $ t/helper/test-tool mergesort test | awk '
      $7 > max[$3] {max[$3] = $7; line[$3] = $0}
      END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
sawtooth  unriffle_skewed       100      128     1184      700      589 OK
sawtooth  unriffle_skewed      1023     1024    16373    10230     9207 OK
sawtooth  unriffle             1024     1024    16384    10240     9217 OK
sawtooth  unriffle_skewed      1025     2048    18454    11275    10241 OK

The sawtooth distribution with m>=n produces a sorted list and
unriffle_skewed mode turns it into adversarial input for unbalanced
merges, which it wins in all cases except for n=1024 -- the resulting
list is the same, but unriffle is tested before unriffle_skewed, so its
result is selected by the AWK script.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:09 -07:00
René Scharfe
1aa589922b test-mergesort: add unriffle mode
Add a mode that turns sorted items into adversarial input for mergesort.
Do that by running mergesort in reverse and rearranging the items in
such a way that each merge needs the maximum number of operations to
undo it.

To riffle is a card shuffling technique and involves splitting a deck
into two and then to interleave them.  A perfect riffle takes one card
from each half in turn.  That's similar to the most expensive merge,
which has to take one item from each sublist in turn, which requires the
maximum number of comparisons (n-1).

So unriffle does that in reverse, i.e. it generates the first sublist
out of the items at even indexes and the second sublist out of the items
at odd indexes, without changing their order in any other way.  Done
recursively until we reach the trivial sublist length of one, this
twists the list into an order that requires the maximum effort for
mergesort to untangle.

As a baseline, here are the rand distributions with the highest number
of comparisons from "test-tool mergesort test":

   $ t/helper/test-tool mergesort test | awk '
      NR > 1 && $1 != "rand" {next}
      $7 > max[$3] {max[$3] = $7; line[$3] = $0}
      END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
rand      copy                  100       32     1184      700      569 OK
rand      reverse_1st_half     1023      256    16373    10230     8976 OK
rand      reverse_1st_half     1024      512    16384    10240     8993 OK
rand      dither               1025       64    18454    11275     9970 OK

And here are the most expensive ones overall:

   $ t/helper/test-tool mergesort test | awk '
      $7 > max[$3] {max[$3] = $7; line[$3] = $0}
      END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
stagger   reverse               100       64     1184      700      580 OK
sawtooth  unriffle             1023     1024    16373    10230     9179 OK
sawtooth  unriffle             1024     1024    16384    10240     9217 OK
stagger   unriffle             1025     2048    18454    11275    10241 OK

The sawtooth distribution with m>=n generates a sorted list.  The
unriffle mode is designed to turn that into adversarial input for
mergesort, and that checks out for n=1023 and n=1024, where it produces
the list that requires the most comparisons.

Item counts that are not powers of two have other winners, and that's
because unriffle recursively splits lists into equal-sized halves, while
llist_mergesort() splits them into the biggest power of two smaller than
n and the rest, e.g. for n=1025 it sorts the first 1024 separately and
finally merges them to the last item.

So unriffle mode works as designed for the intended use case, but to
consistently generate adversarial input for unbalanced merges we need
something else.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
René Scharfe
0cecb75531 test-mergesort: add generate subcommand
Add a subcommand for printing test data.  It can be used to generate
special test cases and feed them into the sort subcommand or sort(1) for
performance measurements.  It may also be useful to illustrate the
effect of distributions, modes and their parameters.

It generates n integers with the specified distribution and its
distribution-specific parameter m.  E.g. m is the maximum value for
the plateau distribution and the length and height of individual teeth
of the sawtooth distribution.

The generated values are printed as zero-padded eight-digit hexadecimal
numbers to make sure alphabetic and numeric order are the same.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
René Scharfe
e031e9719d test-mergesort: add test subcommand
Adapt the qsort certification program from "Engineering a Sort Function"
by Bentley and McIlroy for testing our linked list sort function.  It
generates several lists with various distribution patterns and counts
the number of operations llist_mergesort() needs to order them.  It
compares the result to the output of a trusted sort function (qsort(1))
and also checks if the sort is stable.

Also add a test script that makes use of the new subcommand.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
René Scharfe
d536a71169 test-mergesort: add sort subcommand
Give the code for sorting a text file its own sub-command.  This allows
extending the helper, which we'll do in the following patches.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
René Scharfe
2e6701017e test-mergesort: use strbuf_getline()
Strip line ending characters to make sure empty lines are sorted like
sort(1) does.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
David Aguilar
4ac9f15492 difftool: create a tmpdir path without repeated slashes
The paths generated by difftool are passed to user-facing diff tools.
Using paths with repeated slashes in them is a cosmetic blemish that
is exposed to users and can be avoided.

Use a strbuf to create the buffer used for the dir-diff tmpdir.
Strip trailing slashes from the value read from TMPDIR to avoid
repeated slashes in the generated paths.

Adjust the error handling to avoid leaking strbufs and to avoid
returning -1 to cmd_main().

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-30 18:48:51 -07:00