Commit graph

609 commits

Author SHA1 Message Date
René Scharfe f1ed4ce9e3 test-mergesort: add unriffle_skewed mode
Add a mode that turns a sorted list into adversarial input for a
bottom-up mergesort implementation that doubles the length of sorted
sublists at each level -- like our llist_mergesort().

While unriffle mode splits the list in half at each recursion step,
unriffle_skewed splits it into 2^l items and the rest, with 2^l being
the highest power of two smaller than the number of items and thus
2^l >= rest.  The rest is unriffled with the tail of the first half to
require a merge to compare the maximum number of elements.

It complements the unriffle mode, which targets balanced merges.  If
the number of elements is a power of two then both actually produce the
same result, as 2^l == rest == n/2 at each recursion step in that case.

Here are the results:

   $ t/helper/test-tool mergesort test | awk '
      $7 > max[$3] {max[$3] = $7; line[$3] = $0}
      END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
sawtooth  unriffle_skewed       100      128     1184      700      589 OK
sawtooth  unriffle_skewed      1023     1024    16373    10230     9207 OK
sawtooth  unriffle             1024     1024    16384    10240     9217 OK
sawtooth  unriffle_skewed      1025     2048    18454    11275    10241 OK

The sawtooth distribution with m>=n produces a sorted list and
unriffle_skewed mode turns it into adversarial input for unbalanced
merges, which it wins in all cases except for n=1024 -- the resulting
list is the same, but unriffle is tested before unriffle_skewed, so its
result is selected by the AWK script.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:09 -07:00
René Scharfe 1aa589922b test-mergesort: add unriffle mode
Add a mode that turns sorted items into adversarial input for mergesort.
Do that by running mergesort in reverse and rearranging the items in
such a way that each merge needs the maximum number of operations to
undo it.

To riffle is a card shuffling technique and involves splitting a deck
into two and then to interleave them.  A perfect riffle takes one card
from each half in turn.  That's similar to the most expensive merge,
which has to take one item from each sublist in turn, which requires the
maximum number of comparisons (n-1).

So unriffle does that in reverse, i.e. it generates the first sublist
out of the items at even indexes and the second sublist out of the items
at odd indexes, without changing their order in any other way.  Done
recursively until we reach the trivial sublist length of one, this
twists the list into an order that requires the maximum effort for
mergesort to untangle.

As a baseline, here are the rand distributions with the highest number
of comparisons from "test-tool mergesort test":

   $ t/helper/test-tool mergesort test | awk '
      NR > 1 && $1 != "rand" {next}
      $7 > max[$3] {max[$3] = $7; line[$3] = $0}
      END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
rand      copy                  100       32     1184      700      569 OK
rand      reverse_1st_half     1023      256    16373    10230     8976 OK
rand      reverse_1st_half     1024      512    16384    10240     8993 OK
rand      dither               1025       64    18454    11275     9970 OK

And here are the most expensive ones overall:

   $ t/helper/test-tool mergesort test | awk '
      $7 > max[$3] {max[$3] = $7; line[$3] = $0}
      END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
stagger   reverse               100       64     1184      700      580 OK
sawtooth  unriffle             1023     1024    16373    10230     9179 OK
sawtooth  unriffle             1024     1024    16384    10240     9217 OK
stagger   unriffle             1025     2048    18454    11275    10241 OK

The sawtooth distribution with m>=n generates a sorted list.  The
unriffle mode is designed to turn that into adversarial input for
mergesort, and that checks out for n=1023 and n=1024, where it produces
the list that requires the most comparisons.

Item counts that are not powers of two have other winners, and that's
because unriffle recursively splits lists into equal-sized halves, while
llist_mergesort() splits them into the biggest power of two smaller than
n and the rest, e.g. for n=1025 it sorts the first 1024 separately and
finally merges them to the last item.

So unriffle mode works as designed for the intended use case, but to
consistently generate adversarial input for unbalanced merges we need
something else.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
René Scharfe 0cecb75531 test-mergesort: add generate subcommand
Add a subcommand for printing test data.  It can be used to generate
special test cases and feed them into the sort subcommand or sort(1) for
performance measurements.  It may also be useful to illustrate the
effect of distributions, modes and their parameters.

It generates n integers with the specified distribution and its
distribution-specific parameter m.  E.g. m is the maximum value for
the plateau distribution and the length and height of individual teeth
of the sawtooth distribution.

The generated values are printed as zero-padded eight-digit hexadecimal
numbers to make sure alphabetic and numeric order are the same.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
René Scharfe e031e9719d test-mergesort: add test subcommand
Adapt the qsort certification program from "Engineering a Sort Function"
by Bentley and McIlroy for testing our linked list sort function.  It
generates several lists with various distribution patterns and counts
the number of operations llist_mergesort() needs to order them.  It
compares the result to the output of a trusted sort function (qsort(1))
and also checks if the sort is stable.

Also add a test script that makes use of the new subcommand.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
René Scharfe d536a71169 test-mergesort: add sort subcommand
Give the code for sorting a text file its own sub-command.  This allows
extending the helper, which we'll do in the following patches.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
René Scharfe 2e6701017e test-mergesort: use strbuf_getline()
Strip line ending characters to make sure empty lines are sorted like
sort(1) does.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:08 -07:00
Taylor Blau 6d08b9d4ca builtin/repack.c: make largest pack preferred
When repacking into a geometric series and writing a multi-pack bitmap,
it is beneficial to have the largest resulting pack be the preferred
object source in the bitmap's MIDX, since selecting the large packs can
lead to fewer broken delta chains and better compression.

Teach 'git repack' to identify this pack and pass it to the MIDX write
machinery in order to mark it as preferred.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28 21:20:56 -07:00
Ævar Arnfjörð Bjarmason f69a6e4f07 *.h: move some *_INIT to designated initializers
Move various *_INIT macros to use designated initializers. This helps
readability. I've only picked those leftover macros that were not
touched by another in-flight series of mine which changed others, but
also how initialization was done.

In the case of SUBMODULE_ALTERNATE_SETUP_INIT I've left an explicit
initialization of "error_mode", even though
SUBMODULE_ALTERNATE_ERROR_IGNORE itself is defined as "0". Let's not
peek under the hood and assume that enum fields we know the value of
will stay at "0".

The change to "TESTSUITE_INIT" in "t/helper/test-run-command.c" was
part of an earlier on-list version[1] of c90be786da (test-tool
run-command: fix flip-flop init pattern, 2021-09-11).

1. https://lore.kernel.org/git/patch-1.1-0aa4523ab6e-20210909T130849Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 14:48:00 -07:00
Ævar Arnfjörð Bjarmason 608cfd31cf *.h _INIT macros: don't specify fields equal to 0
Change the initialization of "struct strbuf" changed in
cbc0f81d96 (strbuf: use designated initializers in STRBUF_INIT,
2017-07-10) to omit specifying "alloc" and "len", as we do with other
"alloc" and "len" (or "nr") in similar structs.

Let's likewise omit the explicit initialization of all fields in the
"struct ipc_client_connect_option" struct added in
59c7b88198 (simple-ipc: add win32 implementation, 2021-03-15).

Do the same for a few other initializers, e.g. STRVEC_INIT and
CACHE_DEF_INIT.

Finally, start incrementally changing the same pattern in
"t/helper/test-run-command.c". This change was part of an earlier
on-list version[1] of c90be786da (test-tool run-command: fix
flip-flop init pattern, 2021-09-11).

1. https://lore.kernel.org/git/patch-1.1-0aa4523ab6e-20210909T130849Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 14:47:59 -07:00
Junio C Hamano 0e35107e7d Merge branch 'ab/retire-option-argument'
An oddball OPTION_ARGUMENT feature has been removed from the
parse-options API.

* ab/retire-option-argument:
  parse-options API: remove OPTION_ARGUMENT feature
  difftool: use run_command() API in run_file_diff()
  difftool: prepare "diff" cmdline in cmd_difftool()
  difftool: prepare "struct child_process" in cmd_difftool()
2021-09-23 13:44:48 -07:00
Junio C Hamano ffb0387608 Merge branch 'ab/test-tool-run-command-cleanup'
Code clean-up.

* ab/test-tool-run-command-cleanup:
  test-tool run-command: fix flip-flop init pattern
2021-09-23 13:44:46 -07:00
Ævar Arnfjörð Bjarmason c6b4888b3f environment.c: remove test-specific "ignore_untracked..." variable
Instead of the global ignore_untracked_cache_config variable added in
dae6c322fa (test-dump-untracked-cache: don't modify the untracked
cache, 2016-01-27) we can make use of the new facility to set config
via environment variables added in d8d77153ea (config: allow
specifying config entries via envvar pairs, 2021-01-12).

It's arguably a bit hacky to use setenv() and getenv() to pass
messages between the same program, but since the test helpers are not
the main intended audience of repo-settings.c I think it's better than
hardcoding the test-only special-case in prepare_repo_settings().

This uses the xsetenv() wrapper added in the preceding commit, if we
don't set these in the environment we'll fail in
t7063-status-untracked-cache.sh, but let's fail earlier anyway if that
were to happen.

This breaks any parent process that's potentially using the
GIT_CONFIG_* and GIT_CONFIG_PARAMETERS mechanism to pass one-shot
config setting down to a git subprocess, but in this case we don't
care about the general case of such potential parents. This process
neither spawns other "git" processes, nor is it interested in other
configuration. We might want to pick up other test modes here, but
those will be passed via GIT_TEST_* environment variables.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-22 13:15:00 -07:00
Junio C Hamano 5331af2352 Merge branch 'ab/serve-cleanup'
Code clean-up around "git serve".

* ab/serve-cleanup:
  upload-pack: document and rename --advertise-refs
  serve.[ch]: remove "serve_options", split up --advertise-refs code
  {upload,receive}-pack tests: add --advertise-refs tests
  serve.c: move version line to advertise_capabilities()
  serve: move transfer.advertiseSID check into session_id_advertise()
  serve.[ch]: don't pass "struct strvec *keys" to commands
  serve: use designated initializers
  transport: use designated initializers
  transport: rename "fetch" in transport_vtable to "fetch_refs"
  serve: mark has_capability() as static
2021-09-20 15:20:43 -07:00
Jeff Hostetler 05881a6fc9 t/helper/simple-ipc: convert test-simple-ipc to use start_bg_command
Convert test helper to use `start_bg_command()` when spawning a server
daemon in the background rather than blocks of platform-specific code.

Also, while here, remove _() translation around error messages since
this is a test helper and not Git code.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-20 08:57:58 -07:00
Jeff Hostetler a3e2033e04 simple-ipc: preparations for supporting binary messages.
Add `command_len` argument to the Simple IPC API.

In my original Simple IPC API, I assumed that the request would always
be a null-terminated string of text characters.  The `command`
argument was just a `const char *`.

I found a caller that would like to pass a binary command to the
daemon, so I am amending the Simple IPC API to receive `const char
*command, size_t command_len` arguments.

I considered changing the `command` argument to be a `void *`, but the
IPC layer simply passes it to the pkt-line layer which takes a `const
char *`, so to avoid confusion I left it as is.

Note, the response side has always been a `struct strbuf` which
includes the buffer and length, so we already support returning a
binary answer.  (Yes, it feels a little weird returning a binary
buffer in a `strbuf`, but it works.)

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-20 08:57:58 -07:00
Taylor Blau a05f02b1d9 t/helper/test-bitmap.c: add 'dump-hashes' mode
The pack-bitmap writer code is about to learn how to propagate values
from an existing hash-cache. To prepare, teach the test-bitmap helper to
dump the values from a bitmap's hash-cache extension in order to test
those changes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-14 16:34:17 -07:00
Ævar Arnfjörð Bjarmason 4c25356e0e parse-options API: remove OPTION_ARGUMENT feature
As was noted in 1a85b49b87 (parse-options: make OPT_ARGUMENT() more
useful, 2019-03-14) there's only ever been one user of the
OPT_ARGUMENT(), that user was added in 20de316e33 (difftool: allow
running outside Git worktrees with --no-index, 2019-03-14).

The OPT_ARGUMENT() feature itself was added way back in
580d5bffde (parse-options: new option type to treat an option-like
parameter as an argument., 2008-03-02), but as discussed in
1a85b49b87 wasn't used until 20de316e33 in 2019.

Now that the preceding commit has migrated this code over to using
"struct strvec" to manage the "args" member of a "struct
child_process", we can just use that directly instead of relying on
OPT_ARGUMENT.

This has a minor change in behavior in that if we'll pass --no-index
we'll now always pass it as the first argument, before we'd pass it in
whatever position the caller did. Preserving this was the real value
of OPT_ARGUMENT(), but as it turns out we didn't need that either. We
can always inject it as the first argument, the other end will parse
it just the same.

Note that we cannot remove the "out" and "cpidx" members of "struct
parse_opt_ctx_t" added in 580d5bffde, while they were introduced with
OPT_ARGUMENT() we since used them for other things.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-12 23:27:38 -07:00
Ævar Arnfjörð Bjarmason c90be786da test-tool run-command: fix flip-flop init pattern
In be5d88e112 (test-tool run-command: learn to run (parts of) the
testsuite, 2019-10-04) an init pattern was added that would use
TESTSUITE_INIT, but then promptly memset() everything back to 0. We'd
then set the "dup" on the two string lists.

Our setting of "next" to "-1" thus did nothing, we'd reset it to "0"
before using it. Let's set it to "0" instead, and trust the
"STRING_LIST_INIT_DUP" to set "strdup_strings" appropriately for us.

Note that while we compile this code, there's no in-tree user for the
"testsuite" target being modified here anymore, see the discussion at
and around <nycvar.QRO.7.76.6.2109091323150.59@tvgsbejvaqbjf.bet>[1].

1. https://lore.kernel.org/git/nycvar.QRO.7.76.6.2109091323150.59@tvgsbejvaqbjf.bet/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-12 16:46:54 -07:00
Jonathan Tan 8eb8dcf946 repository: support unabsorbed in repo_submodule_init
In preparation for a subsequent commit that migrates code using
add_submodule_odb() to repo_submodule_init(), teach
repo_submodule_init() to support submodules with unabsorbed gitdirs.
(See the documentation for "git submodule absorbgitdirs" for more
information about absorbed and unabsorbed gitdirs.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 14:09:30 -07:00
Taylor Blau b1b82d1c30 t/helper/test-read-midx.c: add --checksum mode
Subsequent tests will want to check for the existence of a multi-pack
bitmap which matches the multi-pack-index stored in the pack directory.

The multi-pack bitmap includes the hex checksum of the MIDX it
corresponds to in its filename (for example,
'$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
want a way to learn what '<checksum>' is.

This helper addresses that need by printing the checksum of the
repository's multi-pack-index.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Ævar Arnfjörð Bjarmason f234da8019 serve.[ch]: remove "serve_options", split up --advertise-refs code
The "advertise capabilities" mode of serve.c added in
ed10cb952d (serve: introduce git-serve, 2018-03-15) is only used by
the http-backend.c to call {upload,receive}-pack with the
--advertise-refs parameter. See 42526b478e (Add stateless RPC options
to upload-pack, receive-pack, 2009-10-30).

Let's just make cmd_upload_pack() take the two (v2) or three (v2)
parameters the the v2/v1 servicing functions need directly, and pass
those in via the function signature. The logic of whether daemon mode
is implied by the timeout belongs in the v1 function (only used
there).

Once we split up the "advertise v2 refs" from "serve v2 request" it
becomes clear that v2 never cared about those in combination. The only
time it mattered was for v1 to emit its ref advertisement, in that
case we wanted to emit the smart-http-only "no-done" capability.

Since we only do that in the --advertise-refs codepath let's just have
it set "do_done" itself in v1's upload_pack() just before send_ref(),
at that point --advertise-refs and --stateless-rpc in combination are
redundant (the only user is get_info_refs() in http-backend.c), so we
can just pass in --advertise-refs only.

Since we need to touch all the serve() and advertise_capabilities()
codepaths let's rename them to less clever and obvious names, it's
been suggested numerous times, the latest of which is [1]'s suggestion
for protocol_v2_serve_loop(). Let's go with that.

1. https://lore.kernel.org/git/CAFQ2z_NyGb8rju5CKzmo6KhZXD0Dp21u-BbyCb2aNxLEoSPRJw@mail.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-05 08:59:37 -07:00
Junio C Hamano fea3738ac5 Merge branch 'ab/getcwd-test'
Portability test update.

* ab/getcwd-test:
  t0001: fix broken not-quite getcwd(3) test in bed67874e2
2021-08-04 13:28:55 -07:00
Ævar Arnfjörð Bjarmason 482e1488a9 t0001: fix broken not-quite getcwd(3) test in bed67874e2
With a54e938e5b (strbuf: support long paths w/o read rights in
strbuf_getcwd() on FreeBSD, 2017-03-26) we had t0001 break on systems
like OpenBSD and AIX whose getcwd(3) has standard (but not like glibc
et al) behavior.

This was partially fixed in bed67874e2 (t0001: skip test with
restrictive permissions if getpwd(3) respects them, 2017-08-07).

The problem with that fix is that while its analysis of the problem is
correct, it doesn't actually call getcwd(3), instead it invokes "pwd
-P". There is no guarantee that "pwd -P" is going to call getcwd(3),
as opposed to e.g. being a shell built-in.

On AIX under both bash and ksh this test breaks because "pwd -P" will
happily display the current working directory, but getcwd(3) called by
the "git init" we're testing here will fail to get it.

I checked whether clobbering the $PWD environment variable would
affect it, and it didn't. Presumably these shells keep track of their
working directory internally.

There's possible follow-up work here in teaching strbuf_getcwd() to
get the working directory with whatever method "pwd" uses on these
platforms. See [1] for a discussion of that, but let's take the easy
way out here and just skip these tests by fixing the
GETCWD_IGNORES_PERMS prerequisite to match the limitations of
strbuf_getcwd().

1. https://lore.kernel.org/git/b650bef5-d739-d98d-e9f1-fa292b6ce982@web.de/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30 10:18:27 -07:00
Junio C Hamano 5bae927222 Merge branch 'ab/pkt-line-tests'
Tests that cover protocol bits have been updated and helpers
used there have been consolidated.

* ab/pkt-line-tests:
  test-lib-functions: use test-tool for [de]packetize()
2021-07-28 13:18:00 -07:00
Junio C Hamano dd6d3c90ee Merge branch 'ab/attribute-format'
Many "printf"-like helper functions we have have been annotated
with __attribute__() to catch placeholder/parameter mismatches.

* ab/attribute-format:
  advice.h: add missing __attribute__((format)) & fix usage
  *.h: add a few missing __attribute__((format))
  *.c static functions: add missing __attribute__((format))
  sequencer.c: move static function to avoid forward decl
  *.c static functions: don't forward-declare __attribute__
2021-07-28 13:17:59 -07:00
Junio C Hamano e5cc59c77c Merge branch 'ew/many-alternate-optim'
Optimization for repositories with many alternate object store.

* ew/many-alternate-optim:
  oidtree: a crit-bit tree for odb_loose_cache
  oidcpy_with_padding: constify `src' arg
  make object_directory.loose_objects_subdir_seen a bitmap
  avoid strlen via strbuf_addstr in link_alt_odb_entry
  speed up alt_odb_usable() with many alternates
2021-07-28 13:17:57 -07:00
Ævar Arnfjörð Bjarmason 64f0109f17 test-lib-functions: use test-tool for [de]packetize()
The shell+perl "[de]packetize()" helper functions were added in
4414a15002 (t/lib-git-daemon: add network-protocol helpers,
2018-01-24), and around the same time we added the "pkt-line" helper
in 74e7002961 (test-pkt-line: introduce a packet-line test helper,
2018-03-14).

For some reason it seems we've mostly used the shell+perl version
instead of the helper since then. There were discussions around
88124ab263 (test-lib-functions: make packetize() more efficient,
2020-03-27) and cacae4329f (test-lib-functions: simplify packetize()
stdin code, 2020-03-29) to improve them and make them more efficient.

There was one good reason to do so, we needed an equivalent of
"test-tool pkt-line pack", but that command wasn't capable of handling
input with "\n" (a feature) or "\0" (just because it happens to be
printf-based under the hood).

Let's add a "pkt-line-raw" helper for that, and expose is at a
packetize_raw() to go with the existing packetize() on the shell
level, this gives us the smallest amount of change to the tests
themselves.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-19 11:53:50 -07:00
Junio C Hamano 8721e2eaed Merge branch 'jt/partial-clone-submodule-1'
Prepare the internals for lazily fetching objects in submodules
from their promisor remotes.

* jt/partial-clone-submodule-1:
  promisor-remote: teach lazy-fetch in any repo
  run-command: refactor subprocess env preparation
  submodule: refrain from filtering GIT_CONFIG_COUNT
  promisor-remote: support per-repository config
  repository: move global r_f_p_c to repo struct
2021-07-16 17:42:53 -07:00
Junio C Hamano c9780bb2ca Merge branch 'hn/prep-tests-for-reftable'
Preliminary clean-up of tests before the main reftable changes
hits the codebase.

* hn/prep-tests-for-reftable: (22 commits)
  t1415: set REFFILES for test specific to storage format
  t4202: mark bogus head hash test with REFFILES
  t7003: check reflog existence only for REFFILES
  t7900: stop checking for loose refs
  t1404: mark tests that muck with .git directly as REFFILES.
  t2017: mark --orphan/logAllRefUpdates=false test as REFFILES
  t1414: mark corruption test with REFFILES
  t1407: require REFFILES for for_each_reflog test
  test-lib: provide test prereq REFFILES
  t5304: use "reflog expire --all" to clear the reflog
  t5304: restyle: trim empty lines, drop ':' before >
  t7003: use rev-parse rather than FS inspection
  t5000: inspect HEAD using git-rev-parse
  t5000: reformat indentation to the latest fashion
  t1301: fix typo in error message
  t1413: use tar to save and restore entire .git directory
  t1401-symbolic-ref: avoid direct filesystem access
  t1401: use tar to snapshot and restore repo state
  t5601: read HEAD using rev-parse
  t9300: check ref existence using test-helper rather than a file system check
  ...
2021-07-13 16:52:50 -07:00
Ævar Arnfjörð Bjarmason 927dc33070 advice.h: add missing __attribute__((format)) & fix usage
Add the missing __attribute__((format)) checking to
advise_if_enabled(). This revealed a trivial issue introduced in
b3b18d1621 (advice: revamp advise API, 2020-03-02). We treated the
argv[1] as a format string, but did not intend to do so. Let's use
"%s" and pass argv[1] as an argument instead.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-13 15:20:20 -07:00
Junio C Hamano e867110340 Merge branch 'ab/cmd-foo-should-return'
Code clean-up.

* ab/cmd-foo-should-return:
  builtins + test helpers: use return instead of exit() in cmd_*
2021-07-08 13:15:04 -07:00
Eric Wong 92d8ed8ac1 oidtree: a crit-bit tree for odb_loose_cache
This saves 8K per `struct object_directory', meaning it saves
around 800MB in my case involving 100K alternates (half or more
of those alternates are unlikely to hold loose objects).

This is implemented in two parts: a generic, allocation-free
`cbtree' and the `oidtree' wrapper on top of it.  The latter
provides allocation using alloc_state as a memory pool to
improve locality and reduce free(3) overhead.

Unlike oid-array, the crit-bit tree does not require sorting.
Performance is bound by the key length, for oidtree that is
fixed at sizeof(struct object_id).  There's no need to have
256 oidtrees to mitigate the O(n log n) overhead like we did
with oid-array.

Being a prefix trie, it is natively suited for expanding short
object IDs via prefix-limited iteration in
`find_short_object_filename'.

On my busy workstation, p4205 performance seems to be roughly
unchanged (+/-8%).  Startup with 100K total alternates with no
loose objects seems around 10-20% faster on a hot cache.
(800MB in memory savings means more memory for the kernel FS
cache).

The generic cbtree implementation does impose some extra
overhead for oidtree in that it uses memcmp(3) on
"struct object_id" so it wastes cycles comparing 12 extra bytes
on SHA-1 repositories.  I've not yet explored reducing this
overhead, but I expect there are many places in our code base
where we'd want to investigate this.

More information on crit-bit trees: https://cr.yp.to/critbit.html

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-07 21:28:04 -07:00
Jonathan Tan ef830cc434 promisor-remote: teach lazy-fetch in any repo
This is one step towards supporting partial clone submodules.

Even after this patch, we will still lack partial clone submodules
support, primarily because a lot of Git code that accesses submodule
objects does so by adding their object stores as alternates, meaning
that any lazy fetches that would occur in the submodule would be done
based on the config of the superproject, not of the submodule. This also
prevents testing of the functionality in this patch by user-facing
commands. So for now, test this mechanism using a test helper.

Besides that, there is some code that uses the wrapper functions
like has_promisor_remote(). Those will need to be checked to see if they
could support the non-wrapper functions instead (and thus support any
repository, not just the_repository).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-28 09:58:01 -07:00
Junio C Hamano 169914ede2 Merge branch 'en/ort-perf-batch-11'
Optimize out repeated rename detection in a sequence of mergy
operations.

* en/ort-perf-batch-11:
  merge-ort, diffcore-rename: employ cached renames when possible
  merge-ort: handle interactions of caching and rename/rename(1to1) cases
  merge-ort: add helper functions for using cached renames
  merge-ort: preserve cached renames for the appropriate side
  merge-ort: avoid accidental API mis-use
  merge-ort: add code to check for whether cached renames can be reused
  merge-ort: populate caches of rename detection results
  merge-ort: add data structures for in-memory caching of rename detection
  t6429: testcases for remembering renames
  fast-rebase: write conflict state to working tree, index, and HEAD
  fast-rebase: change assert() to BUG()
  Documentation/technical: describe remembering renames optimization
  t6423: rename file within directory that other side renamed
2021-06-14 13:33:27 +09:00
Ævar Arnfjörð Bjarmason 338abb0f04 builtins + test helpers: use return instead of exit() in cmd_*
Change various cmd_* functions that claim to return an "int" to use
"return" instead of exit() to indicate an exit code. These were not
marked with NORETURN, and by directly exit()-ing we'll skip the
cleanup git.c would otherwise do (e.g. closing fd's, erroring if we
can't). See run_builtin() in git.c.

In the case of shell.c and sh-i18n--envsubst.c this was the result of
an incomplete migration to using a cmd_main() in 3f2e2297b9 (add an
extra level of indirection to main(), 2016-07-01).

This was spotted by SunCC 12.5 on Solaris 10 (gcc210 on the gccfarm).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-09 09:15:58 +09:00
Han-Wen Nienhuys 0221eb8678 t/helper/ref-store: initialize oid in resolve-ref
This will print $ZERO_OID when asking for a non-existent ref from the
test-helper.

Since resolve-ref provides direct access to refs_resolve_ref_unsafe(), it
provides a reliable mechanism for accessing REFNAME, while avoiding the implicit
resolution to refs/heads/REFNAME.

Reviewed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-02 10:01:54 +09:00
Elijah Newren f9500261e0 fast-rebase: write conflict state to working tree, index, and HEAD
Previously, when fast-rebase hit a conflict, it simply aborted and left
HEAD, the index, and the working tree where they were before the
operation started.  While fast-rebase does not support restarting from a
conflicted state, write the conflicted state out anyway as it gives us a
way to see what the conflicts are and write tests that check for them.

This will be important in the upcoming commits, because sequencer.c is
only superficially integrated with merge-ort.c; in particular, it calls
merge_switch_to_result() after EACH merge instead of only calling it at
the end of all the sequence of merges (or when a conflict is hit).  This
not only causes needless updates to the working copy and index, but also
causes all intermediate data to be freed and tossed, preventing caching
information from one merge to the next.  However, integrating
sequencer.c more deeply with merge-ort.c is a big task, and making this
small extension to fast-rebase.c provides us with a simple way to test
the edge and corner cases that we want to make sure continue working.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 15:40:39 +09:00
Elijah Newren caba91c373 fast-rebase: change assert() to BUG()
assert() can succinctly document expectations for the code, and do so in
a way that may be useful to future folks trying to refactor the code and
change basic assumptions; it allows them to more quickly find some
places where their violations of previous assumptions trips things up.

Unfortunately, assert() can surround a function call with important
side-effects, which is a huge mistake since some users will compile with
assertions disabled.  I've had to debug such mistakes before in other
codebases, so I should know better.  Luckily, this was only in test
code, but it's still very embarrassing.  Change an assert() to an if
(...) BUG (...).

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 15:40:39 +09:00
Junio C Hamano 416449eaba Merge branch 'jk/symlinked-dotgitx-cleanup'
Various test and documentation updates about .gitsomething paths
that are symlinks.

* jk/symlinked-dotgitx-cleanup:
  docs: document symlink restrictions for dot-files
  fsck: warn about symlinked dotfiles we'll open with O_NOFOLLOW
  t0060: test ntfs/hfs-obscured dotfiles
  t7450: test .gitmodules symlink matching against obscured names
  t7450: test verify_path() handling of gitmodules
  t7415: rename to expand scope
  fsck_tree(): wrap some long lines
  fsck_tree(): fix shadowed variable
  t7415: remove out-dated comment about translation
2021-05-11 15:27:23 +09:00
Junio C Hamano aaa3c8065d Merge branch 'bc/hash-transition-interop-part-1'
SHA-256 transition.

* bc/hash-transition-interop-part-1:
  hex: print objects using the hash algorithm member
  hex: default to the_hash_algo on zero algorithm value
  builtin/pack-objects: avoid using struct object_id for pack hash
  commit-graph: don't store file hashes as struct object_id
  builtin/show-index: set the algorithm for object IDs
  hash: provide per-algorithm null OIDs
  hash: set, copy, and use algo field in struct object_id
  builtin/pack-redundant: avoid casting buffers to struct object_id
  Use the final_oid_fn to finalize hashing of object IDs
  hash: add a function to finalize object IDs
  http-push: set algorithm when reading object ID
  Always use oidread to read into struct object_id
  hash: add an algo member to struct object_id
2021-05-10 16:59:46 +09:00
Jeff King 801ed010bf t0060: test ntfs/hfs-obscured dotfiles
We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).

Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used in some fsck checks, let's make sure they actually work by
throwing a few basic tests at them. Likewise, let's cover .mailmap
(which does need matching code added).

I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.

Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-04 11:52:02 +09:00
Junio C Hamano 8e97852919 Merge branch 'ds/sparse-index-protections'
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.

* ds/sparse-index-protections: (47 commits)
  name-hash: use expand_to_path()
  sparse-index: expand_to_path()
  name-hash: don't add directories to name_hash
  revision: ensure full index
  resolve-undo: ensure full index
  read-cache: ensure full index
  pathspec: ensure full index
  merge-recursive: ensure full index
  entry: ensure full index
  dir: ensure full index
  update-index: ensure full index
  stash: ensure full index
  rm: ensure full index
  merge-index: ensure full index
  ls-files: ensure full index
  grep: ensure full index
  fsck: ensure full index
  difftool: ensure full index
  commit: ensure full index
  checkout: ensure full index
  ...
2021-04-30 13:50:26 +09:00
Junio C Hamano 13158b9910 Merge branch 'jk/promisor-optim'
Handling of "promisor packs" that allows certain objects to be
missing and lazily retrievable has been optimized (a bit).

* jk/promisor-optim:
  revision: avoid parsing with --exclude-promisor-objects
  lookup_unknown_object(): take a repository argument
  is_promisor_object(): free tree buffer after parsing
2021-04-30 13:50:24 +09:00
brian m. carlson 14228447c9 hash: provide per-algorithm null OIDs
Up until recently, object IDs did not have an algorithm member, only a
hash.  Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms.  Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.

Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27 16:31:39 +09:00
Junio C Hamano ab99efc817 Merge branch 'ab/userdiff-tests'
A bit of code clean-up and a lot of test clean-up around userdiff
area.

* ab/userdiff-tests:
  blame tests: simplify userdiff driver test
  blame tests: don't rely on t/t4018/ directory
  userdiff: remove support for "broken" tests
  userdiff tests: list builtin drivers via test-tool
  userdiff tests: explicitly test "default" pattern
  userdiff: add and use for_each_userdiff_driver()
  userdiff style: normalize pascal regex declaration
  userdiff style: declare patterns with consistent style
  userdiff style: re-order drivers in alphabetical order
2021-04-20 17:23:34 -07:00
Junio C Hamano 8446b388b1 Merge branch 'cc/test-helper-bloom-usage-fix'
Usage message fix for a test helper.

* cc/test-helper-bloom-usage-fix:
  test-bloom: fix missing 'bloom' from usage string
2021-04-13 15:28:52 -07:00
Junio C Hamano 0623669fc6 Merge branch 'tb/pack-preferred-tips-to-give-bitmap'
A configuration variable has been added to force tips of certain
refs to be given a reachability bitmap.

* tb/pack-preferred-tips-to-give-bitmap:
  builtin/pack-objects.c: respect 'pack.preferBitmapTips'
  t/helper/test-bitmap.c: initial commit
  pack-bitmap: add 'test_bitmap_commits()' helper
2021-04-13 15:28:50 -07:00
Jeff King 45a187cc34 lookup_unknown_object(): take a repository argument
All of the other lookup_foo() functions take a repository argument, but
lookup_unknown_object() was never converted, and it uses the_repository
internally. Let's fix that.

We could leave a wrapper that uses the_repository, but there aren't that
many calls, so we'll just convert them all. I looked briefly at each
site to see if we had a repository struct (besides the_repository) we
could pass, but none of them do (so this conversion to pass
the_repository is a pure noop in each case, though it does take us one
step closer to eventually getting rid of the_repository).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-13 13:18:46 -07:00
Junio C Hamano e6b971fcf5 Merge branch 'tb/reverse-midx'
An on-disk reverse-index to map the in-pack location of an object
back to its object name across multiple packfiles is introduced.

* tb/reverse-midx:
  midx.c: improve cache locality in midx_pack_order_cmp()
  pack-revindex: write multi-pack reverse indexes
  pack-write.c: extract 'write_rev_file_order'
  pack-revindex: read multi-pack reverse indexes
  Documentation/technical: describe multi-pack reverse indexes
  midx: make some functions non-static
  midx: keep track of the checksum
  midx: don't free midx_name early
  midx: allow marking a pack as preferred
  t/helper/test-read-midx.c: add '--show-objects'
  builtin/multi-pack-index.c: display usage on unrecognized command
  builtin/multi-pack-index.c: don't enter bogus cmd_mode
  builtin/multi-pack-index.c: split sub-commands
  builtin/multi-pack-index.c: define common usage with a macro
  builtin/multi-pack-index.c: don't handle 'progress' separately
  builtin/multi-pack-index.c: inline 'flags' with options
2021-04-08 13:23:25 -07:00
Ævar Arnfjörð Bjarmason 28e8f0d5e5 userdiff tests: list builtin drivers via test-tool
Change the userdiff test to list the builtin drivers via the
test-tool, using the new for_each_userdiff_driver() API function.

This gets rid of the need to modify this part of the test every time a
new pattern is added, see 2ff6c34612 (userdiff: support Bash,
2020-10-22) and 09dad9256a (userdiff: support Markdown, 2020-05-02)
for two recent examples.

I only need the "list-builtin-drivers "argument here, but let's add
"list-custom-drivers" and "list-drivers" too, just because it's easy.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-08 12:19:10 -07:00
Christian Couder dba94e3a85 test-bloom: fix missing 'bloom' from usage string
Like 'get_murmur3' and 'generate_filter', 'get_filter_for_commit' is a
subcommand of `test-tool bloom` not of `test-tool` itself.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-05 22:54:34 -07:00
Junio C Hamano 861794b60d Merge branch 'jh/simple-ipc'
A simple IPC interface gets introduced to build services like
fsmonitor on top.

* jh/simple-ipc:
  t0052: add simple-ipc tests and t/helper/test-simple-ipc tool
  simple-ipc: add Unix domain socket implementation
  unix-stream-server: create unix domain socket under lock
  unix-socket: disallow chdir() when creating unix domain sockets
  unix-socket: add backlog size option to unix_stream_listen()
  unix-socket: eliminate static unix_stream_socket() helper function
  simple-ipc: add win32 implementation
  simple-ipc: design documentation for new IPC mechanism
  pkt-line: add options argument to read_packetized_to_strbuf()
  pkt-line: add PACKET_READ_GENTLE_ON_READ_ERROR option
  pkt-line: do not issue flush packets in write_packetized_*()
  pkt-line: eliminate the need for static buffer in packet_write_gently()
2021-04-02 14:43:14 -07:00
Taylor Blau 483fa7f42d t/helper/test-bitmap.c: initial commit
Add a new 'bitmap' test-tool which can be used to list the commits that
have received bitmaps.

In theory, a determined tester could run 'git rev-list --test-bitmap
<commit>' to check if '<commit>' received a bitmap or not, since
'--test-bitmap' exits with a non-zero code when it can't find the
requested commit.

But this is a dubious behavior to rely on, since arguably 'git
rev-list' could continue its object walk outside of which commits are
covered by bitmaps.

This will be used to test the behavior of 'pack.preferBitmapTips', which
will be added in the following patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-31 23:14:03 -07:00
Derrick Stolee 2782db3eed test-tool: don't force full index
We will use 'test-tool read-cache --table' to check that a sparse
index is written as part of init_repos. Since we will no longer always
expand a sparse index into a full index, add an '--expand' parameter
that adds a call to ensure_full_index() so we can compare a sparse index
directly against a full index, or at least what the in-memory index
looks like when expanded in this way.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:57:46 -07:00
Derrick Stolee e2df6c3972 test-read-cache: print cache entries with --table
This table is helpful for discovering data in the index to ensure it is
being written correctly, especially as we build and test the
sparse-index. This table includes an output format similar to 'git
ls-tree', but should not be compared to that directly. The biggest
reasons are that 'git ls-tree' includes a tree entry for every
subdirectory, even those that would not appear as a sparse directory in
a sparse-index. Further, 'git ls-tree' does not use a trailing directory
separator for its tree rows.

This does not print the stat() information for the blobs. That will be
added in a future change with another option. The tests that are added
in the next few changes care only about the object types and IDs.
However, this future need for full index information justifies the need
for this test helper over extending a user-facing feature, such as 'git
ls-files'.

To make the option parsing slightly more robust, wrap the string
comparisons in a loop adapted from test-dir-iterator.c.

Care must be taken with the final check for the 'cnt' variable. We
continue the expectation that the numerical value is the final argument.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:57:46 -07:00
Taylor Blau 86d174b724 t/helper/test-read-midx.c: add '--show-objects'
The 'read-midx' helper is used in places like t5319 to display basic
information about a multi-pack-index.

In the next patch, the MIDX writing machinery will learn a new way to
choose from which pack an object is selected when multiple copies of
that object exist.

To disambiguate which pack introduces an object so that this feature can
be tested, add a '--show-objects' option which displays additional
information about each object in the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:16:56 -07:00
Junio C Hamano 858119f6d7 Merge branch 'nk/diff-index-fsmonitor'
"git diff-index" codepath has been taught to trust fsmonitor status
to reduce number of lstat() calls.

* nk/diff-index-fsmonitor:
  fsmonitor: add perf test for git diff HEAD
  fsmonitor: add assertion that fsmonitor is valid to check_removed
  fsmonitor: skip lstat deletion check during git diff-index
2021-03-24 14:36:27 -07:00
Jeff Hostetler 36a7eb6876 t0052: add simple-ipc tests and t/helper/test-simple-ipc tool
Create t0052-simple-ipc.sh with unit tests for the "simple-ipc" mechanism.

Create t/helper/test-simple-ipc test tool to exercise the "simple-ipc"
functions.

When the tool is invoked with "run-daemon", it runs a server to listen
for "simple-ipc" connections on a test socket or named pipe and
responds to a set of commands to exercise/stress the communication
setup.

When the tool is invoked with "start-daemon", it spawns a "run-daemon"
command in the background and waits for the server to become ready
before exiting.  (This helps make unit tests in t0052 more predictable
and avoids the need for arbitrary sleeps in the test script.)

The tool also has a series of client "send" commands to send commands
and data to a server instance.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-22 11:52:54 -07:00
Junio C Hamano eabacfd9cb Merge branch 'jc/calloc-fix'
Code clean-up.

* jc/calloc-fix:
  xcalloc: use CALLOC_ARRAY() when applicable
2021-03-19 15:25:37 -07:00
Nipunn Koorapati 7e5aa13d2c fsmonitor: add perf test for git diff HEAD
Update the xargs call so that if your large repo contains
symlinks, test-tool chmtime failure does not end the script.

On Linux
Test                                                          this tree           upstream/master
---------------------------------------------------------------------------------------------------------
7519.4: status (fsmonitor=fsmonitor-watchman)                 0.52(0.43+0.10)     0.53(0.49+0.05) +1.9%
7519.5: status -uno (fsmonitor=fsmonitor-watchman)            0.21(0.15+0.07)     0.22(0.13+0.09) +4.8%
7519.6: status -uall (fsmonitor=fsmonitor-watchman)           1.65(0.93+0.71)     1.69(1.03+0.65) +2.4%
7519.7: status (dirty) (fsmonitor=fsmonitor-watchman)         11.99(11.34+1.58)   11.95(11.02+1.79) -0.3%
7519.8: diff (fsmonitor=fsmonitor-watchman)                   0.25(0.17+0.26)     0.25(0.18+0.26) +0.0%
7519.9: diff HEAD (fsmonitor=fsmonitor-watchman)              0.39(0.25+0.34)     0.89(0.35+0.74) +128.2%
7519.10: diff -- 0_files (fsmonitor=fsmonitor-watchman)       0.16(0.13+0.04)     0.16(0.12+0.05) +0.0%
7519.11: diff -- 10_files (fsmonitor=fsmonitor-watchman)      0.16(0.12+0.05)     0.16(0.12+0.05) +0.0%
7519.12: diff -- 100_files (fsmonitor=fsmonitor-watchman)     0.16(0.12+0.05)     0.16(0.12+0.05) +0.0%
7519.13: diff -- 1000_files (fsmonitor=fsmonitor-watchman)    0.16(0.11+0.06)     0.16(0.12+0.05) +0.0%
7519.14: diff -- 10000_files (fsmonitor=fsmonitor-watchman)   0.18(0.13+0.06)     0.17(0.10+0.08) -5.6%
7519.15: add (fsmonitor=fsmonitor-watchman)                   2.25(1.53+0.68)     2.25(1.47+0.74) +0.0%
7519.18: status (fsmonitor=disabled)                          0.88(0.73+1.03)     0.89(0.67+1.08) +1.1%
7519.19: status -uno (fsmonitor=disabled)                     0.45(0.43+0.89)     0.45(0.34+0.98) +0.0%
7519.20: status -uall (fsmonitor=disabled)                    1.88(1.16+1.58)     1.88(1.22+1.51) +0.0%
7519.21: status (dirty) (fsmonitor=disabled)                  7.53(7.05+2.11)     7.53(6.98+2.04) +0.0%
7519.22: diff (fsmonitor=disabled)                            0.42(0.37+0.92)     0.42(0.38+0.91) +0.0%
7519.23: diff HEAD (fsmonitor=disabled)                       0.44(0.41+0.90)     0.44(0.40+0.91) +0.0%
7519.24: diff -- 0_files (fsmonitor=disabled)                 0.13(0.09+0.05)     0.13(0.09+0.05) +0.0%
7519.25: diff -- 10_files (fsmonitor=disabled)                0.13(0.10+0.04)     0.13(0.10+0.04) +0.0%
7519.26: diff -- 100_files (fsmonitor=disabled)               0.13(0.09+0.05)     0.13(0.10+0.04) +0.0%
7519.27: diff -- 1000_files (fsmonitor=disabled)              0.13(0.09+0.06)     0.13(0.09+0.05) +0.0%
7519.28: diff -- 10000_files (fsmonitor=disabled)             0.14(0.11+0.05)     0.14(0.10+0.05) +0.0%
7519.29: add (fsmonitor=disabled)                             2.43(1.61+1.64)     2.43(1.69+1.57) +0.0%

On linux (2.29.2 vs w/ this patch):
nipunn@nipunn-dbx:~/src/server3$ strace -f -c git diff 2>&1 | grep lstat
  0.04    0.000063           3        20         6 lstat
nipunn@nipunn-dbx:~/src/server3$ strace -f -c git diff HEAD 2>&1 | grep lstat
 94.98    5.242262          10    523783        13 lstat
nipunn@nipunn-dbx:~/src/server3$ strace -f -c ../git/bin-wrappers/git diff 2>&1 | grep lstat
  0.38    0.000032           5         7         3 lstat
nipunn@nipunn-dbx:~/src/server3$ strace -f -c ../git/bin-wrappers/git diff HEAD 2>&1 | grep lstat
 99.44    0.741892           9     81634        10 lstat

On mac (2.29.2 vs w/ this patch):
nipunn-mbp:server nipunn$ sudo dtruss -L -f -c git diff 2>&1 | grep "^lstat64 "
lstat64                                         8
nipunn-mbp:server nipunn$ sudo dtruss -L -f -c git diff HEAD 2>&1 | grep "^lstat64 "
lstat64                                    120242
nipunn-mbp:server nipunn$ sudo dtruss -L -f -c ../git/bin-wrappers/git diff 2>&1 | grep "^lstat64 "
lstat64                                         4
nipunn-mbp:server nipunn$ sudo dtruss -L -f -c ../git/bin-wrappers/git diff HEAD 2>&1 | grep "^lstat64 "
lstat64                                      4497

There are still a bunch of lstats - on directories, but not every file. Progress!

Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18 13:31:14 -07:00
Junio C Hamano 486f4bd183 xcalloc: use CALLOC_ARRAY() when applicable
These are for codebase before Git 2.31

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-15 17:51:10 -07:00
Junio C Hamano aa2d3dbdf5 Merge branch 'jt/trace2-BUG'
Even though invocations of "die()" were logged to the trace2
system, "BUG()"s were not, which has been corrected.

* jt/trace2-BUG:
  usage: trace2 BUG() invocations
2021-02-17 17:21:41 -08:00
Junio C Hamano 8b4701ae4f Merge branch 'ak/corrected-commit-date'
The commit-graph learned to use corrected commit dates instead of
the generation number to help topological revision traversal.

* ak/corrected-commit-date:
  doc: add corrected commit date info
  commit-reach: use corrected commit dates in paint_down_to_common()
  commit-graph: use generation v2 only if entire chain does
  commit-graph: implement generation data chunk
  commit-graph: implement corrected commit date
  commit-graph: return 64-bit generation number
  commit-graph: add a slab to store topological levels
  t6600-test-reach: generalize *_three_modes
  commit-graph: consolidate fill_commit_graph_info
  revision: parse parent in indegree_walk_step()
  commit-graph: fix regression when computing Bloom filters
2021-02-17 17:21:40 -08:00
Junio C Hamano 59ace284f3 Merge branch 'ab/grep-pcre-invalid-utf8'
Update support for invalid UTF-8 in PCRE2.

* ab/grep-pcre-invalid-utf8:
  grep/pcre2: better support invalid UTF-8 haystacks
  grep/pcre2 tests: don't rely on invalid UTF-8 data test
2021-02-10 14:48:33 -08:00
Jonathan Tan 0a9dde4a04 usage: trace2 BUG() invocations
die() messages are traced in trace2, but BUG() messages are not. Anyone
tracking die() messages would have even more reason to track BUG().
Therefore, write to trace2 when BUG() is invoked.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-09 14:14:34 -08:00
Ævar Arnfjörð Bjarmason 95ca1f987e grep/pcre2: better support invalid UTF-8 haystacks
Improve the support for invalid UTF-8 haystacks given a non-ASCII
needle when using the PCREv2 backend.

This is a more complete fix for a bug I started to fix in
870eea8166 (grep: do not enter PCRE2_UTF mode on fixed matching,
2019-07-26), now that PCREv2 has the PCRE2_MATCH_INVALID_UTF mode we
can make use of it.

This fixes the sort of case described in 8a5999838e (grep: stess test
PCRE v2 on invalid UTF-8 data, 2019-07-26), i.e.:

    - The subject string is non-ASCII (e.g. "ævar")
    - We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C"
    - We are using --ignore-case, or we're a non-fixed pattern

If those conditions were satisfied and we matched found non-valid
UTF-8 data PCREv2 might bark on it, in practice this only happened
under the JIT backend (turned on by default on most platforms).

Ultimately this fixes a "regression" in b65abcafc7 ("grep: use PCRE v2
for optimized fixed-string search", 2019-07-01), I'm putting that in
scare-quotes because before then we wouldn't properly support these
complex case-folding, locale etc. cases either, it just broke in
different ways.

There was a bug related to this the PCRE2_NO_START_OPTIMIZE flag fixed
in PCREv2 10.36. It can be worked around by setting the
PCRE2_NO_START_OPTIMIZE flag. Let's do that in those cases, and add
tests for the bug.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-24 16:09:17 -08:00
Jeff King 36a317929b refs: switch peel_ref() to peel_iterated_oid()
The peel_ref() interface is confusing and error-prone:

  - it's typically used by ref iteration callbacks that have both a
    refname and oid. But since they pass only the refname, we may load
    the ref value from the filesystem again. This is inefficient, but
    also means we are open to a race if somebody simultaneously updates
    the ref. E.g., this:

      int some_ref_cb(const char *refname, const struct object_id *oid, ...)
      {
              if (!peel_ref(refname, &peeled))
                      printf("%s peels to %s",
                             oid_to_hex(oid), oid_to_hex(&peeled);
      }

    could print nonsense. It is correct to say "refname peels to..."
    (you may see the "before" value or the "after" value, either of
    which is consistent), but mentioning both oids may be mixing
    before/after values.

    Worse, whether this is possible depends on whether the optimization
    to read from the current iterator value kicks in. So it is actually
    not possible with:

      for_each_ref(some_ref_cb);

    but it _is_ possible with:

      head_ref(some_ref_cb);

    which does not use the iterator mechanism (though in practice, HEAD
    should never peel to anything, so this may not be triggerable).

  - it must take a fully-qualified refname for the read_ref_full() code
    path to work. Yet we routinely pass it partial refnames from
    callbacks to for_each_tag_ref(), etc. This happens to work when
    iterating because there we do not call read_ref_full() at all, and
    only use the passed refname to check if it is the same as the
    iterator. But the requirements for the function parameters are quite
    unclear.

Instead of taking a refname, let's instead take an oid. That fixes both
problems. It's a little funny for a "ref" function not to involve refs
at all. The key thing is that it's optimizing under the hood based on
having access to the ref iterator. So let's change the name to make it
clear why you'd want this function versus just peel_object().

There are two other directions I considered but rejected:

  - we could pass the peel information into the each_ref_fn callback.
    However, we don't know if the caller actually wants it or not. For
    packed-refs, providing it is essentially free. But for loose refs,
    we actually have to peel the object, which would be wasteful in most
    cases. We could likewise pass in a flag to the callback indicating
    whether the peeled information is known, but that complicates those
    callbacks, as they then have to decide whether to manually peel
    themselves. Plus it requires changing the interface of every
    callback, whether they care about peeling or not, and there are many
    of them.

  - we could make a function to return the peeled value of the current
    iterated ref (computing it if necessary), and BUG() otherwise. I.e.:

      int peel_current_iterated_ref(struct object_id *out);

    Each of the current callers is an each_ref_fn callback, so they'd
    mostly be happy. But:

      - we use those callbacks with functions like head_ref(), which do
        not use the iteration code. So we'd need to handle the fallback
        case there, anyway.

      - it's possible that a caller would want to call into generic code
        that sometimes is used during iteration and sometimes not. This
        encapsulates the logic to do the fast thing when possible, and
        fallback when necessary.

The implementation is mostly obvious, but I want to call out a few
things in the patch:

  - the test-tool coverage for peel_ref() is now meaningless, as it all
    collapses to a single peel_object() call (arguably they were pretty
    uninteresting before; the tricky part of that function is the
    fast-path we see during iteration, but these calls didn't trigger
    that). I've just dropped it entirely, though note that some other
    tests relied on the tags we created; I've moved that creation to the
    tests where it matters.

  - we no longer need to take a ref_store parameter, since we'd never
    look up a ref now. We do still rely on a global "current iterator"
    variable which _could_ be kept per-ref-store. But in practice this
    is only useful if there are multiple recursive iterations, at which
    point the more appropriate solution is probably a stack of
    iterators. No caller used the actual ref-store parameter anyway
    (they all call the wrapper that passes the_repository).

  - the original only kicked in the optimization when the "refname"
    pointer matched (i.e., not string comparison). We do likewise with
    the "oid" parameter here, but fall back to doing an actual oideq()
    call. This in theory lets us kick in the optimization more often,
    though in practice no current caller cares. It should never be
    wrong, though (peeling is a property of an object, so two refs
    pointing to the same object would peel identically).

  - the original took care not to touch the peeled out-parameter unless
    we found something to put in it. But no caller cares about this, and
    anyway, it is enforced by peel_object() itself (and even in the
    optimized iterator case, that's where we eventually end up). We can
    shorten the code and avoid an extra copy by just passing the
    out-parameter through the stack.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-21 15:51:31 -08:00
Abhishek Kumar e8b63005c4 commit-graph: implement generation data chunk
As discovered by Ævar, we cannot increment graph version to
distinguish between generation numbers v1 and v2 [1]. Thus, one of
pre-requistes before implementing generation number v2 was to
distinguish between graph versions in a backwards compatible manner.

We are going to introduce a new chunk called Generation DATa chunk (or
GDAT). GDAT will store corrected committer date offsets whereas CDAT
will still store topological level.

Old Git does not understand GDAT chunk and would ignore it, reading
topological levels from CDAT. New Git can parse GDAT and take advantage
of newer generation numbers, falling back to topological levels when
GDAT chunk is missing (as it would happen with a commit-graph written
by old Git).

We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
which forces commit-graph file to be written without generation data
chunk to emulate a commit-graph file written by old Git.

To minimize the space required to store corrrected commit date, Git
stores corrected commit date offsets into the commit-graph file, instea
of corrected commit dates. This saves us 4 bytes per commit, decreasing
the GDAT chunk size by half, but it's possible for the offset to
overflow the 4-bytes allocated for storage. As such overflows are and
should be exceedingly rare, we use the following overflow management
scheme:

We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV')
to store corrected commit dates for commits with offsets greater than
GENERATION_NUMBER_V2_OFFSET_MAX.

If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set
the MSB of the offset and the other bits store the position of corrected
commit date in GDOV chunk, similar to how Extra Edge List is maintained.

We test the overflow-related code with the following repo history:

           F - N - U
          /         \
U - N - U            N
         \          /
	  N - F - N

Where the commits denoted by U have committer date of zero seconds
since Unix epoch, the commits denoted by N have committer date of
1112354055 (default committer date for the test suite) seconds since
Unix epoch and the commits denoted by F have committer date of
(2 ^ 31 - 2) seconds since Unix epoch.

The largest offset observed is 2 ^ 31, just large enough to overflow.

[1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-18 16:21:18 -08:00
Junio C Hamano 8f8f10ac09 Merge branch 'jx/t5411-flake-fix'
The exchange between receive-pack and proc-receive hook did not
carefully check for errors.

* jx/t5411-flake-fix:
  receive-pack: use default version 0 for proc-receive
  receive-pack: gently write messages to proc-receive
  t5411: new helper filter_out_user_friendly_and_stable_output
2020-11-25 15:24:52 -08:00
Junio C Hamano bf0a430f70 Merge branch 'en/strmap'
A specialization of hashmap that uses a string as key has been
introduced.  Hopefully it will see wider use over time.

* en/strmap:
  shortlog: use strset from strmap.h
  Use new HASHMAP_INIT macro to simplify hashmap initialization
  strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  strmap: enable allocations to come from a mem_pool
  strmap: add a strset sub-type
  strmap: split create_entry() out of strmap_put()
  strmap: add functions facilitating use as a string->int map
  strmap: enable faster clearing and reusing of strmaps
  strmap: add more utility functions
  strmap: new utility functions
  hashmap: provide deallocation function names
  hashmap: introduce a new hashmap_partial_clear()
  hashmap: allow re-use after hashmap_free()
  hashmap: adjust spacing to fix argument alignment
  hashmap: add usage documentation explaining hashmap_free[_entries]()
2020-11-21 15:14:38 -08:00
Junio C Hamano a1f95951ef Merge branch 'en/merge-ort-api-null-impl'
Preparation for a new merge strategy.

* en/merge-ort-api-null-impl:
  merge,rebase,revert: select ort or recursive by config or environment
  fast-rebase: demonstrate merge-ort's API via new test-tool command
  merge-ort-wrappers: new convience wrappers to mimic the old merge API
  merge-ort: barebones API of new merge strategy with empty implementation
2020-11-18 13:32:53 -08:00
Junio C Hamano 7660da1618 Merge branch 'ds/maintenance-part-3'
Parts of "git maintenance" to ease writing crontab entries (and
other scheduling system configuration) for it.

* ds/maintenance-part-3:
  maintenance: add troubleshooting guide to docs
  maintenance: use 'incremental' strategy by default
  maintenance: create maintenance.strategy config
  maintenance: add start/stop subcommands
  maintenance: add [un]register subcommands
  for-each-repo: run subcommands on configured repos
  maintenance: add --schedule option and config
  maintenance: optionally skip --auto process
2020-11-18 13:32:53 -08:00
Elijah Newren b19315d8ab Use new HASHMAP_INIT macro to simplify hashmap initialization
Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
hashmaps allocated on the stack can be initialized without a call to
hashmap_init() and in some cases makes the code a bit shorter.  Convert
some callsites over to take advantage of this.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 12:55:27 -08:00
Jiang Xin 80ffeb94f4 receive-pack: use default version 0 for proc-receive
In the verison negotiation phase between "receive-pack" and
"proc-receive", "proc-receive" can send an empty flush-pkt to end the
negotiation and use default version 0. Capabilities (such as
"push-options") are not supported in version 0.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 12:46:56 -08:00
Jiang Xin f65003b4c4 receive-pack: gently write messages to proc-receive
Johannes found a flaky hang in `t5411/test-0013-bad-protocol.sh` in the
osx-clang job of the CI/PR builds, and ran into an issue when using
the `--stress` option with the following error messages:

    fatal: unable to write flush packet: Broken pipe
    send-pack: unexpected disconnect while reading sideband packet
    fatal: the remote end hung up unexpectedly

In this test case, the "proc-receive" hook sends an error message and
dies earlier. While "receive-pack" on the other side of the pipe
should forward the error message of the "proc-receive" hook to the
client side, but it fails to do so. This is because "receive-pack"
uses `packet_write_fmt()` and `packet_flush()` to write pkt-line
message to "proc-receive" hook, and these functions die immediately
when pipe is broken. Using "gently" forms for these functions will get
more predicable output.

Add more "--die-*" options to test helper to test different stages of
the protocol between "receive-pack" and "proc-receive" hook.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 12:46:56 -08:00
Junio C Hamano 6b9f5096eb Merge branch 'js/avoid-split-sideband-message'
The side-band status report can be sent at the same time as the
primary payload multiplexed, but the demultiplexer on the receiving
end incorrectly split a single status report into two, which has
been corrected.

* js/avoid-split-sideband-message:
  test-pkt-line: drop colon from sideband identity
  sideband: report unhandled incomplete sideband messages as bugs
  sideband: avoid reporting incomplete sideband messages
2020-11-02 13:17:37 -08:00
Elijah Newren 6da1a25814 hashmap: provide deallocation function names
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 12:15:50 -08:00
Elijah Newren fe1a21d526 fast-rebase: demonstrate merge-ort's API via new test-tool command
Add a new test-tool command named 'fast-rebase', which is a
super-slimmed down and nowhere near as capable version of 'git rebase'.
'test-tool fast-rebase' is not currently planned for usage in the
testsuite, but is here for two purposes:

  1) Demonstrate the desired API of merge-ort.  In particular,
     fast-rebase takes advantage of the separation of the merging
     operation from the updating of the index and working tree, to
     allow it to pick N commits, but only update the index and working
     tree once at the end.  Look for the calls to
     merge_incore_nonrecursive() and merge_switch_to_result().

  2) Provide a convenient benchmark that isn't polluted by the heavy
     disk writing and forking of unnecessary processes that comes from
     sequencer.c and merge-recursive.c.  fast-rebase is not meant to
     replace sequencer.c, just give ideas on how sequencer.c can be
     changed.  Updating sequencer.c with these goals is probably a
     large amount of work; writing a simple targeted command with
     no documentation, less-than-useful help messages, numerous
     limitations in terms of flags it can accept and situations it can
     handle, and which is flagged off from users is a much easier
     interim step.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-29 14:05:48 -07:00
Jeff King 712b0377db test-pkt-line: drop colon from sideband identity
We pass "sideband: " as our identity for errors to recv_sideband(). But
it already adds the trailing colon and space. This doesn't invalidate
any tests, but it looks funny when you examine the test output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-27 11:57:51 -07:00
Johannes Schindelin 17e7dbbcbc sideband: avoid reporting incomplete sideband messages
In 2b695ecd74 (t5500: count objects through stderr, not trace,
2020-05-06) we tried to ensure that the "Total 3" message could be
grepped in Git's output, even if it sometimes got chopped up into
multiple lines in the trace machinery.

However, the first instance where this mattered now goes through the
sideband machinery, where it is _still_ possible for messages to get
chopped up: it *is* possible for the standard error stream to be sent
byte-for-byte and hence it can be easily interrupted. Meaning: it is
possible for the single line that we're looking for to be chopped up
into multiple sideband packets, with a primary packet being delivered
between them.

This seems to happen occasionally in the `vs-test` part of our CI
builds, i.e. with binaries built using Visual C, but not when building
with GCC or clang; The symptom is that t5500.43 fails to find a line
matching `remote: Total 3` in the `log` file, which ends in something
along these lines:

	remote: Tota
	remote: l 3 (delta 0), reused 0 (delta 0), pack-reused 0

This should not happen, though: we have code in `demultiplex_sideband()`
_specifically_ to stitch back together lines that were delivered in
separate sideband packets.

However, this stitching was broken in a subtle way in fbd76cd450
(sideband: reverse its dependency on pkt-line, 2019-01-16): before that
change, incomplete sideband lines would not be flushed upon receiving a
primary packet, but after that patch, they would be.

The subtleness of this bug comes from the fact that it is easy to get
confused by the ambiguous meaning of the `break` keyword: after writing
the primary packet contents, the `break;` in the original version of
`recv_sideband()` does _not_ break out of the `while` loop, but instead
only ends the `switch` case:

	while (!retval) {
		[...]
		switch (band) {
			[...]
		case 1:
/* Write the contents of the primary packet */
			write_or_die(out, buf + 1, len);
/* Here, we do *not* break out of the loop, `retval` is unchanged */
			break;
		[...]
	}

	if (outbuf.len) {
/* Write any remaining sideband messages lacking a trailing LF */
		strbuf_addch(&outbuf, '\n');
		xwrite(2, outbuf.buf, outbuf.len);
	}

In contrast, after fbd76cd450 (sideband: reverse its dependency on
pkt-line, 2019-01-16), the body of the `while` loop was extracted into
`demultiplex_sideband()`, crucially _including_ the logic to write
incomplete sideband messages:

	switch (band) {
	[...]
	case 1:
		*sideband_type = SIDEBAND_PRIMARY;
/* This does not break out of the loop: the loop is in the caller */
		break;
	[...]
	}

cleanup:
	[...]
/* This logic is now no longer _outside_ the loop but _inside_ */
	if (scratch->len) {
		strbuf_addch(scratch, '\n');
		xwrite(2, scratch->buf, scratch->len);
	}

The correct way to fix this is to return from `demultiplex_sideband()`
early. The caller will then write out the contents of the primary packet
and continue looping. The `scratch` buffer for incomplete sideband
messages is owned by that caller, and will continue to accumulate the
remainder(s) of those messages. The loop will only end once
`demultiplex_sideband()` returned non-zero _and_ did not indicate a
primary packet, which is the case only when we hit the `cleanup:` path,
in which we take care of flushing any unfinished sideband messages and
release the `scratch` buffer.

To ensure that this does not get broken again, we introduce a pair of
subcommands of the `pkt-line` test helper that specifically chop up the
sideband message and squeeze a primary packet into the middle.

Final note: The other test case touched by 2b695ecd74 (t5500: count
objects through stderr, not trace, 2020-05-06) is not affected by this
issue because the sideband machinery is not involved there.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 13:31:00 -07:00
Junio C Hamano 19dd352d03 Merge branch 'jk/unused'
Code cleanup.

* jk/unused:
  dir.c: drop unused "untracked" from treat_path_fast()
  sequencer: handle ignore_footer when parsing trailers
  test-advise: check argument count with argc instead of argv
  sparse-checkout: fill in some options boilerplate
  sequencer: drop repository argument from run_git_commit()
  push: drop unused repo argument to do_push()
  assert PARSE_OPT_NONEG in parse-options callbacks
  env--helper: write to opt->value in parseopt helper
  drop unused argc parameters
  convert: drop unused crlf_action from check_global_conv_flags_eol()
2020-10-05 14:01:52 -07:00
Junio C Hamano c01b041ef0 Merge branch 'ds/in-merge-bases-many-optim-bug'
in_merge_bases_many(), a way to see if a commit is reachable from
any commit in a set of commits, was totally broken when the
commit-graph feature was in use, which has been corrected.

* ds/in-merge-bases-many-optim-bug:
  commit-reach: fix in_merge_bases_many bug
2020-10-05 14:01:50 -07:00
Derrick Stolee 8791bf1841 commit-reach: fix in_merge_bases_many bug
Way back in f9b8908b (commit.c: use generation numbers for
in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit
the in_merge_bases() walk. This works just fine as long as the
caller is checking only two commits, but when there are multiple,
there is a possibility that this heuristic is _very wrong_.

Some code moves since then has changed this method to
repo_in_merge_bases_many() inside commit-reach.c. The heuristic
computes the minimum generation number of the "reference" list, then
compares this number to the generation number of the "commit".

In a recent topic, a test was added that used in_merge_bases_many()
to test if a commit was reachable from a number of commits pulled
from a reflog. However, this highlighted the problem: if any of the
reference commits have a smaller generation number than the given
commit, then the walk is skipped _even if there exist some with
higher generation number_.

This heuristic is wrong! It must check the MAXIMUM generation number
of the reference commits, not the MINIMUM.

This highlights a testing gap. t6600-test-reach.sh covers many
methods in commit-reach.c, including in_merge_bases() and
get_merge_bases_many(), but since these methods either restrict to
two input commits or actually look for the full list of merge bases,
they don't check this heuristic!

Add a possible input to "test-tool reach" that tests
in_merge_bases_many() and add tests to t6600-test-reach.sh that
cover this heuristic. This includes cases for the reference commits
having generation above and below the generation of the input commit,
but also having maximum generation below the generation of the input
commit.

The fix itself is to swap min_generation with a max_generation in
repo_in_merge_bases_many().

Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-02 10:26:31 -07:00
Jeff King 26e28fe7bb test-advise: check argument count with argc instead of argv
We complain if "test-tool advise" is not given an argument, but we
quietly ignore any additional arguments it receives. Let's instead check
that we got the expected number. As a bonus, this silences
-Wunused-parameter, which notes that we don't ever look at argc.

While we're here, we can also fix the indentation in the conditional.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 12:53:48 -07:00
Jeff King e885a84f1b drop unused argc parameters
Many functions take an argv/argc pair, but never actually look at argc.
This makes it useless at best (we use the NULL sentinel in argv to find
the end of the array), and misleading at worst (what happens if the argc
count does not match the argv NULL?).

In each of these instances, the argv NULL does match the argc count, so
there are no bugs here. But let's tighten the interfaces to make it
harder to get wrong (and to reduce some -Wunused-parameter complaints).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 12:53:47 -07:00
Junio C Hamano 288ed98bf7 Merge branch 'tb/bloom-improvements'
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.

* tb/bloom-improvements:
  commit-graph: introduce 'commitGraph.maxNewFilters'
  builtin/commit-graph.c: introduce '--max-new-filters=<n>'
  commit-graph: rename 'split_commit_graph_opts'
  bloom: encode out-of-bounds filters as non-empty
  bloom/diff: properly short-circuit on max_changes
  bloom: use provided 'struct bloom_filter_settings'
  bloom: split 'get_bloom_filter()' in two
  commit-graph.c: store maximum changed paths
  commit-graph: respect 'commitGraph.readChangedPaths'
  t/helper/test-read-graph.c: prepare repo settings
  commit-graph: pass a 'struct repository *' in more places
  t4216: use an '&&'-chain
  commit-graph: introduce 'get_bloom_filter_settings()'
2020-09-29 14:01:20 -07:00
Junio C Hamano 6c430a647c Merge branch 'jx/proc-receive-hook'
"git receive-pack" that accepts requests by "git push" learned to
outsource most of the ref updates to the new "proc-receive" hook.

* jx/proc-receive-hook:
  doc: add documentation for the proc-receive hook
  transport: parse report options for tracking refs
  t5411: test updates of remote-tracking branches
  receive-pack: new config receive.procReceiveRefs
  doc: add document for capability report-status-v2
  New capability "report-status-v2" for git-push
  receive-pack: feed report options to post-receive
  receive-pack: add new proc-receive hook
  t5411: add basic test cases for proc-receive hook
  transport: not report a non-head push as a branch
2020-09-25 15:25:39 -07:00
Derrick Stolee 2fec604f8d maintenance: add start/stop subcommands
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.

The schedule is laid out as follows:

  0 1-23 * * *   $cmd maintenance run --schedule=hourly
  0 0    * * 1-6 $cmd maintenance run --schedule=daily
  0 0    * * 0   $cmd maintenance run --schedule=weekly

where $cmd is a properly-qualified 'git for-each-repo' execution:

$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo

where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.

This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.

The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Taylor Blau 9a7a9ed10d bloom: use provided 'struct bloom_filter_settings'
When 'get_or_compute_bloom_filter()' needs to compute a Bloom filter
from scratch, it looks to the default 'struct bloom_filter_settings' in
order to determine the maximum number of changed paths, number of bits
per entry, and so on.

All of these values have so far been constant, and so there was no need
to pass in a pointer from the caller (eg., the one that is stored in the
'struct write_commit_graph_context').

Start passing in a 'struct bloom_filter_settings *' instead of using the
default values to respect graph-specific settings (eg., in the case of
setting 'GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS').

In order to have an initialized value for these settings, move its
initialization to earlier in the commit-graph write.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 09:31:25 -07:00
Taylor Blau 312cff5207 bloom: split 'get_bloom_filter()' in two
'get_bloom_filter' takes a flag to control whether it will compute a
Bloom filter if the requested one is missing. In the next patch, we'll
add yet another parameter to this method, which would force all but one
caller to specify an extra 'NULL' parameter at the end.

Instead of doing this, split 'get_bloom_filter' into two functions:
'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only
looks up a Bloom filter (and does not compute one if it's missing,
thus dropping the 'compute_if_not_present' flag). The latter does
compute missing Bloom filters, with an additional parameter to store
whether or not it needed to do so.

This simplifies many call-sites, since the majority of existing callers
to 'get_bloom_filter' do not want missing Bloom filters to be computed
(so they can drop the parameter entirely and use the simpler version of
the function).

While we're at it, instrument the new 'get_or_compute_bloom_filter()'
with counters in the 'write_commit_graph_context' struct which store
the number of filters that we did and didn't compute, as well as filters
that were truncated.

It would be nice to drop the 'compute_if_not_present' flag entirely,
since all remaining callers of 'get_or_compute_bloom_filter' pass it as
'1', but this will change in a future patch and hence cannot be removed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 09:31:25 -07:00
Taylor Blau 24f951a492 t/helper/test-read-graph.c: prepare repo settings
The read-graph test-tool is used by a number of the commit-graph test to
assert various properties about a commit-graph. Previously, this program
never ran 'prepare_repo_settings()'. There was no need to do so, since
none of the commit-graph machinery is affected by the repo settings.

In the next patch, the commit-graph machinery's behavior will become
dependent on the repo settings, and so loading them before running the
rest of the test tool is critical.

As such, teach the test tool to call 'prepare_repo_settings()'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-09 12:51:48 -07:00
Junio C Hamano afd49c39dd Merge branch 'jk/slimmed-down'
Trim an unused binary and turn a bunch of commands into built-in.

* jk/slimmed-down:
  drop vcs-svn experiment
  make git-fast-import a builtin
  make git-bugreport a builtin
  make credential helpers builtins
  Makefile: drop builtins from MSVC pdb list
2020-09-03 12:37:02 -07:00
Junio C Hamano 0d9a8e33f9 Merge branch 'jk/leakfix'
Code clean-up.

* jk/leakfix:
  submodule--helper: fix leak of core.worktree value
  config: fix leak in git_config_get_expiry_in_days()
  config: drop git_config_get_string_const()
  config: fix leaks from git_config_get_string_const()
  checkout: fix leak of non-existent branch names
  submodule--helper: use strbuf_release() to free strbufs
  clear_pattern_list(): clear embedded hashmaps
2020-08-27 14:04:49 -07:00
Jiang Xin 15d3af5e22 receive-pack: add new proc-receive hook
Git calls an internal `execute_commands` function to handle commands
sent from client to `git-receive-pack`.  Regardless of what references
the user pushes, git creates or updates the corresponding references if
the user has write-permission.  A contributor who has no
write-permission, cannot push to the repository directly.  So, the
contributor has to write commits to an alternate location, and sends
pull request by emails or by other ways.  We call this workflow as a
distributed workflow.

It would be more convenient to work in a centralized workflow like what
Gerrit provided for some cases.  For example, a read-only user who
cannot push to a branch directly can run the following `git push`
command to push commits to a pseudo reference (has a prefix "refs/for/",
not "refs/heads/") to create a code review.

    git push origin \
        HEAD:refs/for/<branch-name>/<session>

The `<branch-name>` in the above example can be as simple as "master",
or a more complicated branch name like "foo/bar".  The `<session>` in
the above example command can be the local branch name of the client
side, such as "my/topic".

We cannot implement a centralized workflow elegantly by using
"pre-receive" + "post-receive", because Git will call the internal
function "execute_commands" to create references (even the special
pseudo reference) between these two hooks.  Even though we can delete
the temporarily created pseudo reference via the "post-receive" hook,
having a temporary reference is not safe for concurrent pushes.

So, add a filter and a new handler to support this kind of workflow.
The filter will check the prefix of the reference name, and if the
command has a special reference name, the filter will turn a specific
field (`run_proc_receive`) on for the command.  Commands with this filed
turned on will be executed by a new handler (a hook named
"proc-receive") instead of the internal `execute_commands` function.
We can use this "proc-receive" command to create pull requests or send
emails for code review.

Suggested by Junio, this "proc-receive" hook reads the commands,
push-options (optional), and send result using a protocol in pkt-line
format.  In the following example, the letter "S" stands for
"receive-pack" and letter "H" stands for the hook.

    # Version and features negotiation.
    S: PKT-LINE(version=1\0push-options atomic...)
    S: flush-pkt
    H: PKT-LINE(version=1\0push-options...)
    H: flush-pkt

    # Send commands from server to the hook.
    S: PKT-LINE(<old-oid> <new-oid> <ref>)
    S: ... ...
    S: flush-pkt
    # Send push-options only if the 'push-options' feature is enabled.
    S: PKT-LINE(push-option)
    S: ... ...
    S: flush-pkt

    # Receive result from the hook.
    # OK, run this command successfully.
    H: PKT-LINE(ok <ref>)
    # NO, I reject it.
    H: PKT-LINE(ng <ref> <reason>)
    # Fall through, let 'receive-pack' to execute it.
    H: PKT-LINE(ok <ref>)
    H: PKT-LINE(option fall-through)
    # OK, but has an alternate reference.  The alternate reference name
    # and other status can be given in options
    H: PKT-LINE(ok <ref>)
    H: PKT-LINE(option refname <refname>)
    H: PKT-LINE(option old-oid <old-oid>)
    H: PKT-LINE(option new-oid <new-oid>)
    H: PKT-LINE(option forced-update)
    H: ... ...
    H: flush-pkt

After receiving a command, the hook will execute the command, and may
create/update different reference.  For example, a command for a pseudo
reference "refs/for/master/topic" may create/update different reference
such as "refs/pull/123/head".  The alternate reference name and other
status are given in option lines.

The list of commands returned from "proc-receive" will replace the
relevant commands that are sent from user to "receive-pack", and
"receive-pack" will continue to run the "execute_commands" function and
other routines.  Finally, the result of the execution of these commands
will be reported to end user.

The reporting function from "receive-pack" to "send-pack" will be
extended in latter commit just like what the "proc-receive" hook reports
to "receive-pack".

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27 12:47:47 -07:00
Derrick Stolee d96075428a multi-pack-index: use hash version byte
Similar to the commit-graph format, the multi-pack-index format has a
byte in the header intended to track the hash version used to write the
file. This allows one to interpret the hash length without having the
context of the repository config specifying the hash length. This was
not modified as part of the SHA-256 work because the hash length was
automatically up-shifted due to that config.

Since we have this byte available, we can make the file formats more
obviously incompatible instead of relying on other context from the
repository.

Add a new oid_version() method in midx.c similar to the one in
commit-graph.c. This is specifically made separate from that
implementation to avoid artificially linking the formats.

The test impact requires a few more things than the corresponding change
in the commit-graph format. Specifically, 'test-tool read-midx' was not
writing anything about this header value to output. Since the value
available in 'struct multi_pack_index' is hash_len instead of a version
value, we output "20" or "32" instead of "1" or "2".

Since we want a user to not have their Git commands fail if their
multi-pack-index has the incorrect hash version compared to the
repository's hash version, we relax the die() to an error() in
load_multi_pack_index(). This has some effect on 'git multi-pack-index
verify' as we need to check that a failed parse of a file that exists is
actually a verify error. For that test that checks the hash version
matches, we change the corrupted byte from "2" to "3" to ensure the test
fails for both hash algorithms.

Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 16:45:20 -07:00
Jeff King f1de981e8b config: fix leaks from git_config_get_string_const()
There are two functions to get a single config string:

  - git_config_get_string()

  - git_config_get_string_const()

One might naively think that the first one allocates a new string and
the second one just points us to the internal configset storage. But
in fact they both allocate a new copy; the second one exists only to
avoid having to cast when using it with a const global which we never
intend to free.

The documentation for the function explains that clearly, but it seems
I'm not alone in being surprised by this. Of 17 calls to the function,
13 of them leak the resulting value.

We could obviously fix these by adding the appropriate free(). But it
would be simpler still if we actually had a non-allocating way to get
the string. There's git_config_get_value() but that doesn't quite do
what we want. If the config key is present but is a boolean with no
value (e.g., "[foo]bar" in the file), then we'll get NULL (whereas the
string versions will print an error and die).

So let's introduce a new variant, git_config_get_string_tmp(), that
behaves as these callers expect. We need a new name because we have new
semantics but the same function signature (so even if we converted the
four remaining callers, topics in flight might be surprised). The "tmp"
is because this value should only be held onto for a short time. In
practice it's rare for us to clear and refresh the configset,
invalidating the pointer, but hopefully the "tmp" makes callers think
about the lifetime. In each of the converted cases here the value only
needs to last within the local function or its immediate caller.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-14 10:52:04 -07:00
Jeff King fc47391e24 drop vcs-svn experiment
The code in vcs-svn was started in 2010 as an attempt to build a
remote-helper for interacting with svn repositories (as opposed to
git-svn). However, we never got as far as shipping a mature remote
helper, and the last substantive commit was e99d012a6b in 2012.

We do have a git-remote-testsvn, and it is even installed as part of
"make install". But given the name, it seems unlikely to be used by
anybody (you'd have to explicitly "git clone testsvn::$url", and there
have been zero mentions of that on the mailing list since 2013, and even
that includes the phrase "you might need to hack a bit to get it working
properly"[1]).

We also ship contrib/svn-fe, which builds on the vcs-svn work. However,
it does not seem to build out of the box for me, as the link step misses
some required libraries for using libgit.a. Curiously, the original
build breakage bisects for me to eff80a9fd9 (Allow custom "comment
char", 2013-01-16), which seems unrelated. There was an attempt to fix
it in da011cb0e7 (contrib/svn-fe: fix Makefile, 2014-08-28), but on my
system that only switches the error message.

So it seems like the result is not really usable by anybody in practice.
It would be wonderful if somebody wanted to pick up the topic again, and
potentially it's worth carrying around for that reason. But the flip
side is that people doing tree-wide operations have to deal with this
code.  And you can see the list with (replace "HEAD" with this commit as
appropriate):

  {
    echo "--"
    git diff-tree --diff-filter=D -r --name-only HEAD^ HEAD
  } |
  git log --no-merges --oneline e99d012a6bc.. --stdin

which shows 58 times somebody had to deal with the code, generally due
to a compile or test failure, or a tree-wide style fix or API change.
Let's drop it and let anybody who wants to pick it up do so by
resurrecting it from the git history.

As a bonus, this also reduces the size of a stripped installation of Git
from 21MB to 19MB.

[1] https://lore.kernel.org/git/CALkWK0mPHzKfzFKKpZkfAus3YVC9NFYDbFnt+5JQYVKipk3bQQ@mail.gmail.com/

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-13 11:02:15 -07:00
Junio C Hamano e0ad9574dd Merge branch 'bc/sha-256-part-3'
The final leg of SHA-256 transition.

* bc/sha-256-part-3: (39 commits)
  t: remove test_oid_init in tests
  docs: add documentation for extensions.objectFormat
  ci: run tests with SHA-256
  t: make SHA1 prerequisite depend on default hash
  t: allow testing different hash algorithms via environment
  t: add test_oid option to select hash algorithm
  repository: enable SHA-256 support by default
  setup: add support for reading extensions.objectformat
  bundle: add new version for use with SHA-256
  builtin/verify-pack: implement an --object-format option
  http-fetch: set up git directory before parsing pack hashes
  t0410: mark test with SHA1 prerequisite
  t5308: make test work with SHA-256
  t9700: make hash size independent
  t9500: ensure that algorithm info is preserved in config
  t9350: make hash size independent
  t9301: make hash size independent
  t9300: use $ZERO_OID instead of hard-coded object ID
  t9300: abstract away SHA-1-specific constants
  t8011: make hash size independent
  ...
2020-08-11 18:04:11 -07:00
Jeff King d70a9eb611 strvec: rename struct fields
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").

Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 19:18:06 -07:00
brian m. carlson 094a685cd7 t: make test-bloom initialize repository
The bloom filter code relies on reading object IDs using parse_oid_hex.
In order to make that work with an appropriate size, we need to have
initialized the repository's hash algorithm.  Since the values we're
processing depend on the repository in use, let's set up the repository
when we run the test helper.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 09:16:45 -07:00
Jeff King f6d8942b1f strvec: fix indentation in renamed calls
Code which split an argv_array call across multiple lines, like:

  argv_array_pushl(&args, "one argument",
                   "another argument", "and more",
		   NULL);

was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:

  strvec_pushl(&args, "one argument",
                   "another argument", "and more",
		   NULL);

Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:

  git jump grep 'strvec_.*,$'

and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King c972bf4cf5 strvec: convert remaining callers away from argv_array name
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).

This patch converts all of the remaining files, as the resulting diff is
reasonably sized.

The conversion was done purely mechanically with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe '
    s/ARGV_ARRAY/STRVEC/g;
    s/argv_array/strvec/g;
  '

We'll deal with any indentation/style fallouts separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King dbbcd44fb4 strvec: rename files from argv-array to strvec
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe 's/argv-array.h/strvec.h/'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:17 -07:00
Junio C Hamano 0258ed1e08 Merge branch 'cb/is-descendant-of'
Code clean-up.

* cb/is-descendant-of:
  commit-reach: avoid is_descendant_of() shim
2020-07-06 22:09:16 -07:00
Junio C Hamano 645f63111b Merge branch 'es/get-worktrees-unsort'
API cleanup for get_worktrees()

* es/get-worktrees-unsort:
  worktree: drop get_worktrees() unused 'flags' argument
  worktree: drop get_worktrees() special-purpose sorting option
2020-07-06 22:09:15 -07:00
Junio C Hamano d80bea479d Merge branch 'ak/commit-graph-to-slab'
A few fields in "struct commit" that do not have to always be
present have been moved to commit slabs.

* ak/commit-graph-to-slab:
  commit-graph: minimize commit_graph_data_slab access
  commit: move members graph_pos, generation to a slab
  commit-graph: introduce commit_graph_data_slab
  object: drop parsed_object_pool->commit_count
2020-07-06 22:09:14 -07:00
Junio C Hamano 12210859da Merge branch 'bc/sha-256-part-2'
SHA-256 migration work continues.

* bc/sha-256-part-2: (44 commits)
  remote-testgit: adapt for object-format
  bundle: detect hash algorithm when reading refs
  t5300: pass --object-format to git index-pack
  t5704: send object-format capability with SHA-256
  t5703: use object-format serve option
  t5702: offer an object-format capability in the test
  t/helper: initialize the repository for test-sha1-array
  remote-curl: avoid truncating refs with ls-remote
  t1050: pass algorithm to index-pack when outside repo
  builtin/index-pack: add option to specify hash algorithm
  remote-curl: detect algorithm for dumb HTTP by size
  builtin/ls-remote: initialize repository based on fetch
  t5500: make hash independent
  serve: advertise object-format capability for protocol v2
  connect: parse v2 refs with correct hash algorithm
  connect: pass full packet reader when parsing v2 refs
  Documentation/technical: document object-format for protocol v2
  t1302: expect repo format version 1 for SHA-256
  builtin/show-index: provide options to determine hash algo
  t5302: modernize test formatting
  ...
2020-07-06 22:09:13 -07:00
Carlo Marcelo Arenas Belón c1ea625f72 commit-reach: avoid is_descendant_of() shim
d91d6fbf26 (commit-reach: create repo_is_descendant_of(), 2020-06-17)
adds a repository aware version of is_descendant_of() and a backward
compatibility shim that is barely used.

Update all callers to directly use the new repo_is_descendant_of()
function instead; making the codebase simpler and pushing more
the_repository references higher up the stack.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 16:36:53 -07:00
Eric Sunshine 03f2465bb1 worktree: drop get_worktrees() unused 'flags' argument
get_worktrees() accepts a 'flags' argument, however, there are no
existing flags (the lone flag GWT_SORT_LINKED was recently retired) and
no behavior which can be tweaked. Therefore, drop the 'flags' argument.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 10:31:15 -07:00
brian m. carlson 54cbbe4c6e t/helper: initialize the repository for test-sha1-array
test-sha1-array uses the_hash_algo under the hood. Since t0064 wants to
use the value that is correct for the hash algorithm that we're testing,
make sure the test helper initializes the repository to set
the_hash_algo correctly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:08 -07:00
Abhishek Kumar 6da43d937c object: drop parsed_object_pool->commit_count
14ba97f8 (alloc: allow arbitrary repositories for alloc functions,
2018-05-15) introduced parsed_object_pool->commit_count to keep count of
commits per repository and was used to assign commit->index.

However, commit-slab code requires commit->index values to be unique
and a global count would be correct, rather than a per-repo count.

Let's introduce a static counter variable, `parsed_commits_count` to
keep track of parsed commits so far.

As commit_count has no use anymore, let's also drop it from the struct.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17 14:37:14 -07:00
Junio C Hamano b37fd14beb Merge branch 'dl/remote-curl-deadlock-fix'
On-the-wire protocol v2 easily falls into a deadlock between the
remote-curl helper and the fetch-pack process when the server side
prematurely throws an error and disconnects.  The communication has
been updated to make it more robust.

* dl/remote-curl-deadlock-fix:
  stateless-connect: send response end packet
  pkt-line: define PACKET_READ_RESPONSE_END
  remote-curl: error on incomplete packet
  pkt-line: extern packet_length()
  transport: extract common fetch_pack() call
  remote-curl: remove label indentation
  remote-curl: fix typo
2020-06-08 18:06:30 -07:00
Junio C Hamano f4cec40dbd Merge branch 'cb/t4210-illseq-auto-detect'
As FreeBSD is not the only platform whose regexp library reports
a REG_ILLSEQ error when fed invalid UTF-8, add logic to detect that
automatically and skip the affected tests.

* cb/t4210-illseq-auto-detect:
  t4210: detect REG_ILLSEQ dynamically and skip affected tests
  t/helper: teach test-regex to report pattern errors (like REG_ILLSEQ)
2020-06-08 18:06:27 -07:00
Denton Liu 0181b600a6 pkt-line: define PACKET_READ_RESPONSE_END
In a future commit, we will use PACKET_READ_RESPONSE_END to separate
messages proxied by remote-curl. To prepare for this, add the
PACKET_READ_RESPONSE_END enum value.

In switch statements that need a case added, die() or BUG() when a
PACKET_READ_RESPONSE_END is unexpected. Otherwise, mirror how
PACKET_READ_DELIM is implemented (especially in cases where packets are
being forwarded).

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24 16:26:00 -07:00
Carlo Marcelo Arenas Belón aba8187e4d t/helper: teach test-regex to report pattern errors (like REG_ILLSEQ)
7187c7bbb8 (t4210: skip i18n tests that don't work on FreeBSD, 2019-11-27)
adds a REG_ILLSEQ prerequisite to avoid failures from the tests added in
4e2443b181 (log tests: test regex backends in "--encode=<enc>" tests,
2019-06-28), but hardcodes it to be only enabled in FreeBSD.

Instead of hardcoding the affected platform, teach the test-regex helper,
how to validate a pattern and report back, so it can be used to detect the
same issue in other affected systems (like DragonFlyBSD or macOS).

While at it, refactor the tool so it can report back the source of the
errors it founds, and can be invoked also in a --silent mode, when needed,
for backward compatibility.  A missing flag has been added and the code
reformatted, as well as updates to the way the parameters are handled, for
consistency.

To minimize changes, it is assumed the regcomp error is of the right type
since we control the only caller, and is also assumed to affect both basic
and extended syntax (only basic is tested, but both behave the same in all
three affected platforms since they use the same function).

Based-on-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 13:03:35 -07:00
Junio C Hamano 4b1e5e5d8c Merge branch 'ds/bloom-cleanup'
Code cleanup and typofixes

* ds/bloom-cleanup:
  completion: offer '--(no-)patch' among 'git log' options
  bloom: use num_changes not nr for limit detection
  bloom: de-duplicate directory entries
  Documentation: changed-path Bloom filters use byte words
  bloom: parse commit before computing filters
  test-bloom: fix usage typo
  bloom: fix whitespace around tab length
2020-05-14 14:39:44 -07:00
Đoàn Trần Công Danh 066b70ae97 bloom: fix make sparse warning
* We need a `final_new_line` to make our source code as text file, per
  POSIX and C specification.
* `bloom_filters` should be limited to interal linkage only

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-07 17:08:21 -07:00
Junio C Hamano bf04590ecd Merge branch 'dd/sparse-fixes'
Compilation fix.

* dd/sparse-fixes:
  progress.c: silence cgcc suggestion about internal linkage
  graph.c: limit linkage of internal variable
  compat/regex: move stdlib.h up in inclusion chain
  test-parse-pathspec-file.c: s/0/NULL/ for pointer type
2020-05-01 13:39:56 -07:00
Junio C Hamano 6d56d4c7dc Merge branch 'ds/blame-on-bloom'
"git blame" learns to take advantage of the "changed-paths" Bloom
filter stored in the commit-graph file.

* ds/blame-on-bloom:
  test-bloom: check that we have expected arguments
  test-bloom: fix some whitespace issues
  blame: drop unused parameter from maybe_changed_path
  blame: use changed-path Bloom filters
  tests: write commit-graph with Bloom filters
  revision: complicated pathspecs disable filters
2020-05-01 13:39:54 -07:00
Junio C Hamano 9b6606f43d Merge branch 'gs/commit-graph-path-filter'
Introduce an extension to the commit-graph to make it efficient to
check for the paths that were modified at each commit using Bloom
filters.

* gs/commit-graph-path-filter:
  bloom: ignore renames when computing changed paths
  commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag
  t4216: add end to end tests for git log with Bloom filters
  revision.c: add trace2 stats around Bloom filter usage
  revision.c: use Bloom filters to speed up path based revision walks
  commit-graph: add --changed-paths option to write subcommand
  commit-graph: reuse existing Bloom filters during write
  commit-graph: write Bloom filters to commit graph file
  commit-graph: examine commits by generation number
  commit-graph: examine changed-path objects in pack order
  commit-graph: compute Bloom filters for changed paths
  diff: halt tree-diff early after max_changes
  bloom.c: core Bloom filter implementation for changed paths.
  bloom.c: introduce core Bloom filter constructs
  bloom.c: add the murmur3 hash implementation
  commit-graph: define and use MAX_NUM_CHUNKS
2020-05-01 13:39:53 -07:00
Junio C Hamano 6a1c17d05b Merge branch 'tb/commit-graph-split-strategy'
"git commit-graph write" learned different ways to write out split
files.

* tb/commit-graph-split-strategy:
  Revert "commit-graph.c: introduce '--[no-]check-oids'"
  commit-graph.c: introduce '--[no-]check-oids'
  commit-graph.h: replace 'commit_hex' with 'commits'
  oidset: introduce 'oidset_size'
  builtin/commit-graph.c: introduce split strategy 'replace'
  builtin/commit-graph.c: introduce split strategy 'no-merge'
  builtin/commit-graph.c: support for '--split[=<strategy>]'
  t/helper/test-read-graph.c: support commit-graph chains
2020-05-01 13:39:52 -07:00
Derrick Stolee 54c337be9c test-bloom: fix usage typo
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-01 11:41:21 -07:00
Junio C Hamano 28ba5a7b27 Merge branch 'dd/test-with-busybox'
Various tests have been updated to work around issues found with
shell utilities that come with busybox etc.

* dd/test-with-busybox:
  t5703: feed raw data into test-tool unpack-sideband
  t4124: tweak test so that non-compliant diff(1) can also be used
  t7063: drop non-POSIX argument "-ls" from find(1)
  t5616: use rev-parse instead to get HEAD's object_id
  t5003: skip conversion test if unzip -a is unavailable
  t5003: drop the subshell in test_lazy_prereq
  test-lib-functions: test_cmp: eval $GIT_TEST_CMP
  t4061: use POSIX compliant regex(7)
2020-04-28 15:49:55 -07:00
Đoàn Trần Công Danh 3cacb9aaf4 progress.c: silence cgcc suggestion about internal linkage
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Reviewed-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-27 11:21:28 -07:00
Đoàn Trần Công Danh 1437ebf74a test-parse-pathspec-file.c: s/0/NULL/ for pointer type
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Reviewed-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-27 11:21:12 -07:00
Jeff King 1b4c57fa87 test-bloom: check that we have expected arguments
If "test-tool bloom" is not fed a command, or if arguments are missing
for some commands, it will just segfault. Let's check argc and write a
friendlier usage message.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 16:05:29 -07:00
Jeff King 24b7d1e7b0 test-bloom: fix some whitespace issues
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-23 16:05:29 -07:00
Junio C Hamano a768f866e9 Merge branch 'jk/oid-array-cleanups'
Code cleanup.

* jk/oid-array-cleanups:
  oidset: stop referring to sha1-array
  ref-filter: stop referring to "sha1 array"
  bisect: stop referring to sha1_array
  test-tool: rename sha1-array to oid-array
  oid_array: rename source file from sha1-array
  oid_array: use size_t for iteration
  oid_array: use size_t for count and allocation
2020-04-22 13:42:49 -07:00
Junio C Hamano 810dc6481a Merge branch 'js/trace2-env-vars'
Trace2 enhancement to allow logging of the environment variables.

* js/trace2-env-vars:
  trace2: teach Git to log environment variables
2020-04-22 13:42:44 -07:00
Taylor Blau 2fa05f31bd t/helper/test-read-graph.c: support commit-graph chains
In 61df89c8e5 (commit-graph: don't early exit(1) on e.g. "git status",
2019-03-25), the former 'load_commit_graph_one' was refactored into
'open_commit_graph' and 'load_commit_graph_one_fd_st' as a means of
avoiding an early-exit from non-library code.

However, 'load_commit_graph_one' does not support commit-graph chains,
and hence the 'read-graph' test tool does not work with them.

Replace 'load_commit_graph_one' with 'read_commit_graph_one' in order to
support commit-graph chains. In the spirit of 61df89c8e5,
'read_commit_graph_one' does not ever 'die()', making it a suitable
replacement here.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-15 09:20:24 -07:00
Garima Singh a759bfa9ee t4216: add end to end tests for git log with Bloom filters
These tests exercises writing commit graph with Bloom filters
and exercises 'git log -- path' with all the applicable
options. They check that the output is the same with and
without Bloom filters, confirm Bloom filters were used by
checking if trace2 statistics were logged correctly.

Also confirms cases where Bloom filters are not used:
1. Multiple path specs,
2. --walk-reflogs (see patch titled 'revision.c: use Bloom filters...'
   for details,
3. If the latest commit graph does not have Bloom filters

Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06 11:08:37 -07:00
Garima Singh 1217c03e7b commit-graph: reuse existing Bloom filters during write
Add logic to
a) parse Bloom filter information from the commit graph file and,
b) re-use existing Bloom filters.

See Documentation/technical/commit-graph-format for the format in which
the Bloom filter information is written to the commit graph file.

To read Bloom filter for a given commit with lexicographic position
'i' we need to:
1. Read BIDX[i] which essentially gives us the starting index in BDAT for
   filter of commit i+1. It is essentially the index past the end
   of the filter of commit i. It is called end_index in the code.

2. For i>0, read BIDX[i-1] which will give us the starting index in BDAT
   for filter of commit i. It is called the start_index in the code.
   For the first commit, where i = 0, Bloom filter data starts at the
   beginning, just past the header in the BDAT chunk. Hence, start_index
   will be 0.

3. The length of the filter will be end_index - start_index, because
   BIDX[i] gives the cumulative 8-byte words including the ith
   commit's filter.

We toggle whether Bloom filters should be recomputed based on the
compute_if_not_present flag.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06 11:08:37 -07:00
Jeff King ed4b804e46 test-tool: rename sha1-array to oid-array
This matches the actual data structure name, as well as the source file
that contains the code we're testing. The test scripts need updating to
use the new name, as well.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:59:08 -07:00
Jeff King fe299ec5ae oid_array: rename source file from sha1-array
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to
oid_array, 2017-03-31), but the file is still called sha1-array. Besides
being slightly confusing, it makes it more annoying to grep for leftover
occurrences of "sha1" in various files, because the header is included
in so many places.

Let's complete the transition by renaming the source and header files
(and fixing up a few comment references).

I kept the "-" in the name, as that seems to be our style; cf.
fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10).
We also have oidmap.h and oidset.h without any punctuation, but those
are "struct oidmap" and "struct oidset" in the code. We _could_ make
this "oidarray" to match, but somehow it looks uglier to me because of
the length of "array" (plus it would be a very invasive patch for little
gain).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:59:08 -07:00
Garima Singh ed591febb4 bloom.c: core Bloom filter implementation for changed paths.
Add the core implementation for computing Bloom filters for
the paths changed between a commit and it's first parent.

We fill the Bloom filters as (const char *data, int len) pairs
as `struct bloom_filters" within a commit slab.

Filters for commits with no changes and more than 512 changes,
is represented with a filter of length zero. There is no gain
in distinguishing between a computed filter of length zero for
a commit with no changes, and an uncomputed filter for new commits
or for commits with more than 512 changes. The effect on
`git log -- path` is the same in both cases. We will fall back to
the normal diffing algorithm when we can't benefit from the
existence of Bloom filters.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 09:59:53 -07:00
Garima Singh f1294eaf7f bloom.c: introduce core Bloom filter constructs
Introduce the constructs for Bloom filters, Bloom filter keys
and Bloom filter settings.
For details on what Bloom filters are and how they work, refer
to Dr. Derrick Stolee's blog post [1]. It provides a concise
explanation of the adoption of Bloom filters as described in
[2] and [3].

Implementation specifics:
1. We currently use 7 and 10 for the number of hashes and the
   size of each entry respectively. They served as great starting
   values, the mathematical details behind this choice are
   described in [1] and [4]. The implementation, while not
   completely open to it at the moment, is flexible enough to allow
   for tweaking these settings in the future.

   Note: The performance gains we have observed with these values
   are significant enough that we did not need to tweak these
   settings. The performance numbers are included in the cover letter
   of this series and in the commit message of the subsequent commit
   where we use Bloom filters to speed up `git log -- path`.

2. As described in [1] and [3], we do not need 7 independent hashing
   functions. We use the Murmur3 hashing scheme, seed it twice and
   then combine those to procure an arbitrary number of hash values.

3. The filters will be sized according to the number of changes in
   each commit, in multiples of 8 bit words.

[1] Derrick Stolee
      "Supercharging the Git Commit Graph IV: Bloom Filters"
      https://devblogs.microsoft.com/devops/super-charging-the-git-commit-graph-iv-Bloom-filters/

[2] Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, George Varghese
    "An Improved Construction for Counting Bloom Filters"
    http://theory.stanford.edu/~rinap/papers/esa2006b.pdf
    https://doi.org/10.1007/11841036_61

[3] Peter C. Dillinger and Panagiotis Manolios
    "Bloom Filters in Probabilistic Verification"
    http://www.ccs.neu.edu/home/pete/pub/Bloom-filters-verification.pdf
    https://doi.org/10.1007/978-3-540-30494-4_26

[4] Thomas Mueller Graf, Daniel Lemire
    "Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters"
    https://arxiv.org/abs/1912.08258

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 09:59:53 -07:00
Garima Singh f52207a45c bloom.c: add the murmur3 hash implementation
In preparation for computing changed paths Bloom filters,
implement the Murmur3 hash algorithm as described in [1].
It hashes the given data using the given seed and produces
a uniformly distributed hash value.

[1] https://en.wikipedia.org/wiki/MurmurHash#Algorithm

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Reviewed-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 09:59:53 -07:00
Đoàn Trần Công Danh 84370e36bb t5703: feed raw data into test-tool unpack-sideband
busybox's sed isn't binary clean.
Thus, triggers false-negative on this test.

We could replace sed with perl on this usecase.
But, we could slightly modify the helper to discard unwanted data in the
beginning.

Fix the false negative by updating this helper.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-26 17:30:48 -07:00
Junio C Hamano f8cb64e3d4 Merge branch 'bc/sha-256-part-1-of-4'
SHA-256 transition continues.

* bc/sha-256-part-1-of-4: (22 commits)
  fast-import: add options for rewriting submodules
  fast-import: add a generic function to iterate over marks
  fast-import: make find_marks work on any mark set
  fast-import: add helper function for inserting mark object entries
  fast-import: permit reading multiple marks files
  commit: use expected signature header for SHA-256
  worktree: allow repository version 1
  init-db: move writing repo version into a function
  builtin/init-db: add environment variable for new repo hash
  builtin/init-db: allow specifying hash algorithm on command line
  setup: allow check_repository_format to read repository format
  t/helper: make repository tests hash independent
  t/helper: initialize repository if necessary
  t/helper/test-dump-split-index: initialize git repository
  t6300: make hash algorithm independent
  t6300: abstract away SHA-1-specific constants
  t: use hash-specific lookup tables to define test constants
  repository: require a build flag to use SHA-256
  hex: add functions to parse hex object IDs in any algorithm
  hex: introduce parsing variants taking hash algorithms
  ...
2020-03-26 17:11:20 -07:00
Junio C Hamano 4d0e8996ec Merge branch 'am/real-path-fix'
The real_path() convenience function can easily be misused; with a
bit of code refactoring in the callers' side, its use has been
eliminated.

* am/real-path-fix:
  get_superproject_working_tree(): return strbuf
  real_path_if_valid(): remove unsafe API
  real_path: remove unsafe API
  set_git_dir: fix crash when used with real_path()
2020-03-25 13:57:43 -07:00
Junio C Hamano c4a09cc9cc Merge branch 'hw/advise-ng'
Revamping of the advise API to allow more systematic enumeration of
advice knobs in the future.

* hw/advise-ng:
  tag: use new advice API to check visibility
  advice: revamp advise API
  advice: change "setupStreamFailure" to "setUpstreamFailure"
  advice: extract vadvise() from advise()
2020-03-25 13:57:41 -07:00
Josh Steadmon 3d3adaad91 trace2: teach Git to log environment variables
Via trace2, Git can already log interesting config parameters (see the
trace2_cmd_list_config() function). However, this can grant an
incomplete picture because many config parameters also allow overrides
via environment variables.

To allow for more complete logs, we add a new trace2_cmd_list_env_vars()
function and supporting implementation, modeled after the pre-existing
config param logging implementation.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-23 13:14:53 -07:00
Alexandr Miloslavskiy 3d7747e318 real_path: remove unsafe API
Returning a shared buffer invites very subtle bugs due to reentrancy or
multi-threading, as demonstrated by the previous patch.

There was an unfinished effort to abolish this [1].

Let's finally rid of `real_path()`, using `strbuf_realpath()` instead.

This patch uses a local `strbuf` for most places where `real_path()` was
previously called.

However, two places return the value of `real_path()` to the caller. For
them, a `static` local `strbuf` was added, effectively pushing the
problem one level higher:
    read_gitfile_gently()
    get_superproject_working_tree()

[1] https://lore.kernel.org/git/1480964316-99305-1-git-send-email-bmwill@google.com/

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-10 11:41:40 -07:00
Junio C Hamano 0e0d717537 Merge branch 'pb/am-show-current-patch'
"git am --short-current-patch" is a way to show the piece of e-mail
for the stopped step, which is not suitable to directly feed "git
apply" (it is designed to be a good "git am" input).  It learned a
new option to show only the patch part.

* pb/am-show-current-patch:
  am: support --show-current-patch=diff to retrieve .git/rebase-apply/patch
  am: support --show-current-patch=raw as a synonym for--show-current-patch
  am: convert "resume" variable to a struct
  parse-options: convert "command mode" to a flag
  parse-options: add testcases for OPT_CMDMODE()
2020-03-09 11:21:19 -07:00
Heba Waly b3b18d1621 advice: revamp advise API
Currently it's very easy for the advice library's callers to miss
checking the visibility step before printing an advice. Also, it makes
more sense for this step to be handled by the advice library.

Add a new advise_if_enabled function that checks the visibility of
advice messages before printing.

Add a new helper advise_enabled to check the visibility of the advice
if the caller needs to carry out complicated processing based on that
value.

A list of advice_settings is added to cache the config variables names
and values, it's intended to replace advice_config[] and the global
variables once we migrate all the callers to use the new APIs.

Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-05 06:15:02 -08:00
Junio C Hamano ff41848e99 Merge branch 'rs/micro-cleanups'
Code cleanup.

* rs/micro-cleanups:
  use strpbrk(3) to search for characters from a given set
  quote: use isalnum() to check for alphanumeric characters
2020-03-02 15:07:20 -08:00
Junio C Hamano d0038f4b31 Merge branch 'bw/remote-rename-update-config'
"git remote rename X Y" needs to adjust configuration variables
(e.g. branch.<name>.remote) whose value used to be X to Y.
branch.<name>.pushRemote is now also updated.

* bw/remote-rename-update-config:
  remote rename/remove: gently handle remote.pushDefault config
  config: provide access to the current line number
  remote rename/remove: handle branch.<name>.pushRemote config values
  remote: clean-up config callback
  remote: clean-up by returning early to avoid one indentation
  pull --rebase/remote rename: document and honor single-letter abbreviations rebase types
2020-02-25 11:18:32 -08:00
brian m. carlson bf154a8782 t/helper: make repository tests hash independent
This test currently hard-codes the hash algorithm as SHA-1 when calling
repo_set_hash_algo so that the_hash_algo is properly initialized.
However, this does not work with SHA-256 repositories. Read the
repository value that repo_init has read into the local repository
variable and set the algorithm based on that value.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:27 -08:00
brian m. carlson 8dca7f30e4 t/helper: initialize repository if necessary
The repository helper is used in t5318 to read commit graphs whether
we're in a repository or not. However, without a repository, we have no
way to properly initialize the hash algorithm, meaning that the file is
misread.

Fix this by calling setup_git_directory_gently, which will read the
environment variable the testsuite sets to ensure that the correct hash
algorithm is set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:27 -08:00
brian m. carlson 6946e525ae t/helper/test-dump-split-index: initialize git repository
In this test helper, we read the index.  In order to have the proper
hash algorithm set up, we must call setup_git_directory.  Do so, so that
the test works when extensions.objectFormat is set.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:24 -08:00