Commit graph

9292 commits

Author SHA1 Message Date
Junio C Hamano a1f95951ef Merge branch 'en/merge-ort-api-null-impl'
Preparation for a new merge strategy.

* en/merge-ort-api-null-impl:
  merge,rebase,revert: select ort or recursive by config or environment
  fast-rebase: demonstrate merge-ort's API via new test-tool command
  merge-ort-wrappers: new convience wrappers to mimic the old merge API
  merge-ort: barebones API of new merge strategy with empty implementation
2020-11-18 13:32:53 -08:00
Junio C Hamano 7660da1618 Merge branch 'ds/maintenance-part-3'
Parts of "git maintenance" to ease writing crontab entries (and
other scheduling system configuration) for it.

* ds/maintenance-part-3:
  maintenance: add troubleshooting guide to docs
  maintenance: use 'incremental' strategy by default
  maintenance: create maintenance.strategy config
  maintenance: add start/stop subcommands
  maintenance: add [un]register subcommands
  for-each-repo: run subcommands on configured repos
  maintenance: add --schedule option and config
  maintenance: optionally skip --auto process
2020-11-18 13:32:53 -08:00
Junio C Hamano c042c455d4 Merge branch 'pw/rebase-i-orig-head'
"git rebase -i" did not store ORIG_HEAD correctly.

* pw/rebase-i-orig-head:
  rebase -i: simplify get_revision_ranges()
  rebase -i: use struct object_id when writing state
  rebase -i: use struct object_id rather than looking up commit
  rebase -i: stop overwriting ORIG_HEAD buffer
2020-11-18 13:32:53 -08:00
Junio C Hamano 5edc8bdc06 Merge branch 'jk/format-patch-output'
"git format-patch --output=there" did not work as expected and
instead crashed.  The option is now supported.

* jk/format-patch-output:
  format-patch: support --output option
  format-patch: tie file-opening logic to output_directory
  format-patch: refactor output selection
2020-11-18 13:32:52 -08:00
Junio C Hamano f8a1cee7b3 Merge branch 'jc/line-log-takes-no-pathspec'
"git log -L<range>:<path>" is documented to take no pathspec, but
this was not enforced by the command line option parser, which has
been corrected.

* jc/line-log-takes-no-pathspec:
  log: diagnose -L used with pathspec as an error
2020-11-18 13:32:52 -08:00
Junio C Hamano 30f5257611 Merge branch 'rs/empty-reflog-check-fix'
The code to see if "git stash drop" can safely remove refs/stash
has been made more carerful.

* rs/empty-reflog-check-fix:
  stash: simplify reflog emptiness check
2020-11-18 13:32:52 -08:00
Taylor Blau 2fcb03b52d builtin/repack.c: don't move existing packs out of the way
When 'git repack' creates a pack with the same name as any existing
pack, it moves the existing one to 'old-pack-xxx.{pack,idx,...}' and
then renames the new one into place.

Eventually, it would be nice to have 'git repack' allow for writing a
multi-pack index at the critical time (after the new packs have been
written / moved into place, but before the old ones have been deleted).
Guessing that this option might be called '--write-midx', this makes the
following situation (where repacks are issued back-to-back without any
new objects) impossible:

    $ git repack -adb
    $ git repack -adb --write-midx

In the second repack, the existing packs are overwritten verbatim with
the same rename-to-old sequence. At that point, the current MIDX is
invalidated, since it refers to now-missing packs. So that code wants to
be run after the MIDX is re-written. But (prior to this patch) the new
MIDX can't be written until the new packs are moved into place. So, we
have a circular dependency.

This is all hypothetical, since no code currently exists to write a MIDX
safely during a 'git repack' (the 'GIT_TEST_MULTI_PACK_INDEX' does so
unsafely). Putting hypothetical aside, though: why do we need to rename
existing packs to be prefixed with 'old-' anyway?

This behavior dates all the way back to 2ad47d6 (git-repack: Be
careful when updating the same pack as an existing one., 2006-06-25).
2ad47d6 is mainly concerned about a case where a newly written pack
would have a different structure than its index. This used to be
possible when the pack name was a hash of the set of objects. Under this
naming scheme, two packs that store the same set of objects could differ
in delta selection, object positioning, or both. If this happened, then
any such packs would be unreadable in the instant between copying the
new pack and new index (i.e., either the index or pack will be stale
depending on the order that they were copied).

But since 1190a1a (pack-objects: name pack files after trailer hash,
2013-12-05), this is no longer possible, since pack files are named not
after their logical contents (i.e., the set of objects), but by the
actual checksum of their contents. So, this old- behavior can safely go,
which allows us to avoid our circular dependency above.

In addition to avoiding the circular dependency, this patch also makes
'git repack' a lot simpler, since we don't have to deal with failures
encountered when renaming existing packs to be prefixed with 'old-'.

This patch is mostly limited to removing code paths that deal with the
'old' prefixing, with the exception of files that include the pack's
name in their own filename, like .idx, .bitmap, and related files. The
exception is that we want to continue to trust what pack-objects wrote.
That is, it is not the case that we pretend as if pack-objects didn't
write files identical to ones that already exist, but rather that we
respect what pack-objects wrote as the source of truth. That cuts two
ways:

  - If pack-objects produced an identical pack to one that already
    exists with a bitmap, but did not produce a bitmap, we remove the
    bitmap that already exists. (This behavior is codified in t7700.14).

  - If pack-objects produced an identical pack to one that already
    exists, we trust the just-written version of the coresponding .idx,
    .promisor, and other files over the ones that already exist. This
    ensures that we use the most up-to-date versions of this files,
    which is safe even in the face of format changes in, say, the .idx
    file (which would not be reflected in the .idx file's name).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-17 13:31:55 -08:00
Philippe Blain 5176f20ffe pull: check for local submodule modifications with the right range
Ever since 'git pull' learned '--recurse-submodules' in a6d7eb2c7a
(pull: optionally rebase submodules (remote submodule changes only),
2017-06-23), we check if there are local submodule modifications by
checking the revision range 'curr_head --not rebase_fork_point'.

The goal of this check is to abort the pull if there are submodule
modifications in the local commits being rebased, since this scenario is
not supported.

However, the actual range of commits being rebased is not
'rebase_fork_point..curr_head', as the logic in
'get_rebase_newbase_and_upstream' reveals, it is 'upstream..curr_head'.

If the 'git merge-base --fork-point' invocation in
'get_rebase_fork_point' fails to find a fork point between the current
branch and the remote-tracking branch we are pulling from,
'rebase_fork_point' is null and since 4d36f88be7 (submodule: do not pass
null OID to setup_revisions, 2018-05-24), 'submodule_touches_in_range'
checks 'curr_head' and all its ancestors for submodule modifications.

Since it is highly likely that there are submodule modifications in this
range (which is in effect the whole history of the current branch), this
prevents 'git pull --rebase --recurse-submodules' from succeeding if no
fork point exists between the current branch and the remote-tracking
branch being pulled. This can happen, for example, when the current
branch was forked from a commit which was never recorded in the reflog
of the remote-tracking branch we are pulling, as the last two paragraphs
of the "Discussion on fork-point mode" section in git-merge-base(1)
explain.

Fix this bug by passing 'upstream' instead of 'rebase_fork_point' as the
'excl_oid' argument to 'submodule_touches_in_range'.

Reported-by: Brice Goglin <bgoglin@free.fr>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 16:01:13 -08:00
Philippe Blain 4f66d79ae3 pull --rebase: compute rebase arguments in separate function
The function 'run_rebase' is responsible for constructing the
command line to be passed to 'git rebase'. This includes both forwarding
pass-through options given to 'git pull' as well computing the <newbase>
and <upstream> arguments to 'git rebase'.

A following commit will need to access the <upstream> argument in
'cmd_pull' to fix a bug with 'git pull --rebase --recurse-submodules'.
In order to do so, refactor the code so that the <newbase> and
<upstream> commits are computed in a new, separate function,
'get_rebase_newbase_and_upstream'.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 16:01:13 -08:00
Taylor Blau 704c4a5c07 builtin/repack.c: keep track of what pack-objects wrote
In the subsequent commit, it will become useful to keep track of which
metadata files were written by pack-objects. We already do this to an
extent with the 'exts' array, which only is used in the context of
existing packs.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 15:57:44 -08:00
Jeff King 63f4d5cf57 repack: make "exts" array available outside cmd_repack()
We'll use it in a helper function soon.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 15:57:43 -08:00
Patrick Steinhardt 8c4417f1cf update-ref: disallow "start" for ongoing transactions
It is currently possible to write multiple "start" commands into
git-update-ref(1) for a single session, but none of them except for the
first one actually have any effect.

Using such nested "start"s may eventually have a sensible effect. One
may imagine that it restarts the current transaction, effectively
emptying it and creating a new one. It may also allow for creation of
nested transactions. But currently, none of these are implemented.

Silently ignoring this misuse is making it hard to iterate in the future
if "start" is ever going to have meaningful semantics in such a context.
This commit thus makes sure to error out in case we see such use.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 13:44:01 -08:00
Patrick Steinhardt 262a4d28fe update-ref: allow creation of multiple transactions
While git-update-ref has recently grown commands which allow interactive
control of transactions in e48cf33b61 (update-ref: implement interactive
transaction handling, 2020-04-02), it is not yet possible to create
multiple transactions in a single session. To do so, one currently still
needs to invoke the executable multiple times.

This commit addresses this shortcoming by allowing the "start" command
to create a new transaction if the current transaction has already been
either committed or aborted.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 13:44:01 -08:00
Jeff King a9bc372ef8 use size_t to store pack .idx byte offsets
We sometimes store the offset into a pack .idx file as an "unsigned
long", but the mmap'd size of a pack .idx file can exceed 4GB. This is
sufficient on LP64 systems like Linux, but will be too small on LLP64
systems like Windows, where "unsigned long" is still only 32 bits. Let's
use size_t, which is a better type for an offset into a memory buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 13:41:35 -08:00
Jeff King f86f769550 compute pack .idx byte offsets using size_t
A pack and its matching .idx file are limited to 2^32 objects, because
the pack format contains a 32-bit field to store the number of objects.
Hence we use uint32_t in the code.

But the byte count of even a .idx file can be much larger than that,
because it stores at least a hash and an offset for each object. So
using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650
objects. This confuses load_idx(), which computes the minimum size like
this:

  unsigned long min_size = 8 + 4*256 + nr*(hashsz + 4 + 4) + hashsz + hashsz;

Even though min_size will be big enough on most 64-bit platforms, the
actual arithmetic is done as a uint32_t, resulting in a truncation. We
actually exceed that min_size, but then we do:

  unsigned long max_size = min_size;
  if (nr)
          max_size += (nr - 1)*8;

to account for the variable-sized table. That computation doesn't
overflow quite so low, but with the truncation for min_size, we end up
with a max_size that is much smaller than our actual size. So we
complain that the idx is invalid, and can't find any of its objects.

We can fix this case by casting "nr" to a size_t, which will do the
multiplication in 64-bits (assuming you're on a 64-bit platform; this
will never work on a 32-bit system since we couldn't map the whole .idx
anyway). Likewise, we don't have to worry about further additions,
because adding a smaller number to a size_t will convert the other side
to a size_t.

A few notes:

  - obviously we could just declare "nr" as a size_t in the first place
    (and likewise, packed_git.num_objects).  But it's conceptually a
    uint32_t because of the on-disk format, and we correctly treat it
    that way in other contexts that don't need to compute byte offsets
    (e.g., iterating over the set of objects should and generally does
    use a uint32_t). Switching to size_t would make all of those other
    cases look wrong.

  - it could be argued that the proper type is off_t to represent the
    file offset. But in practice the .idx file must fit within memory,
    because we mmap the whole thing. And the rest of the code (including
    the idx_size variable we're comparing against) uses size_t.

  - we'll add the same cast to the max_size arithmetic line. Even though
    we're adding to a larger type, which will convert our result, the
    multiplication is still done as a 32-bit value and can itself
    overflow. I didn't check this with my test case, since it would need
    an even larger pack (~530M objects), but looking at compiler output
    shows that it works this way. The standard should agree, but I
    couldn't find anything explicit in 6.3.1.8 ("usual arithmetic
    conversions").

The case in load_idx() was the most immediate one that I was able to
trigger. After fixing it, looking up actual objects (including the very
last one in sha1 order) works in a test repo with 153,725,110 objects.
That's because bsearch_hash() works with uint32_t entry indices, and the
actual byte access:

  int cmp = hashcmp(table + mi * stride, sha1);

is done with "stride" as a size_t, causing the uint32_t "mi" to be
promoted to a size_t. This is the way most code will access the index
data.

However, I audited all of the other byte-wise accesses of
packed_git.index_data, and many of the others are suspect (they are
similar to the max_size one, where we are adding to a properly sized
offset or directly to a pointer, but the multiplication in the
sub-expression can overflow). I didn't trigger any of these in practice,
but I believe they're potential problems, and certainly adding in the
cast is not going to hurt anything here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 13:41:35 -08:00
Josh Steadmon a2a066d96a receive-pack: log received client session ID
When receive-pack receives a session-id capability from the client, log
the received session ID via a trace2 data event.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 18:26:53 -08:00
Josh Steadmon 8073d75bbf receive-pack: advertise session ID in v0 capabilities
When transfer.advertiseSID is true, advertise receive-pack's session ID
via the new session-id capability.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 18:26:53 -08:00
Junio C Hamano 902f358555 Merge branch 'rs/clear-commit-marks-in-repo'
Code clean-up.

* rs/clear-commit-marks-in-repo:
  bisect: clear flags in passed repository
  object: allow clear_commit_marks_all to handle any repo
2020-11-11 13:18:37 -08:00
Elijah Newren 449a900969 shortlog: use strset from strmap.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 12:55:27 -08:00
Elijah Newren b19315d8ab Use new HASHMAP_INIT macro to simplify hashmap initialization
Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
hashmaps allocated on the stack can be initialized without a call to
hashmap_init() and in some cases makes the code a bit shorter.  Convert
some callsites over to take advantage of this.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 12:55:27 -08:00
Jiang Xin 80ffeb94f4 receive-pack: use default version 0 for proc-receive
In the verison negotiation phase between "receive-pack" and
"proc-receive", "proc-receive" can send an empty flush-pkt to end the
negotiation and use default version 0. Capabilities (such as
"push-options") are not supported in version 0.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 12:46:56 -08:00
Jiang Xin f65003b4c4 receive-pack: gently write messages to proc-receive
Johannes found a flaky hang in `t5411/test-0013-bad-protocol.sh` in the
osx-clang job of the CI/PR builds, and ran into an issue when using
the `--stress` option with the following error messages:

    fatal: unable to write flush packet: Broken pipe
    send-pack: unexpected disconnect while reading sideband packet
    fatal: the remote end hung up unexpectedly

In this test case, the "proc-receive" hook sends an error message and
dies earlier. While "receive-pack" on the other side of the pipe
should forward the error message of the "proc-receive" hook to the
client side, but it fails to do so. This is because "receive-pack"
uses `packet_write_fmt()` and `packet_flush()` to write pkt-line
message to "proc-receive" hook, and these functions die immediately
when pipe is broken. Using "gently" forms for these functions will get
more predicable output.

Add more "--die-*" options to test helper to test different stages of
the protocol between "receive-pack" and "proc-receive" hook.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 12:46:56 -08:00
Jeff King 3a1f91cfd9 rev-parse: handle --end-of-options
We taught rev-list a new way to separate options from revisions in
19e8789b23 (revision: allow --end-of-options to end option parsing,
2019-08-06), but rev-parse uses its own parser. It should know about
--end-of-options not only for consistency, but because it may be
presented with similarly ambiguous cases. E.g., if a caller does:

  git rev-parse "$rev" -- "$path"

to parse an untrusted input, then it will get confused if $rev contains
an option-like string like "--local-env-vars". Or even "--not-real",
which we'd keep as an option to pass along to rev-list.

Or even more importantly:

  git rev-parse --verify "$rev"

can be confused by options, even though its purpose is safely parsing
untrusted input. On the plus side, it will always fail the --verify
part, as it will not have parsed a revision, so the caller will
generally "fail closed" rather than continue to use the untrusted
string. But it will still trigger whatever option was in "$rev"; this
should be mostly harmless, since rev-parse options are all read-only,
but I didn't carefully audit all paths.

This patch lets callers write:

  git rev-parse --end-of-options "$rev" -- "$path"

and:

  git rev-parse --verify --end-of-options "$rev"

which will both treat "$rev" always as a revision parameter. The latter
is a bit clunky. It would be nicer if we had defined "--verify" to
require that its next argument be the revision. But we have not
historically done so, and:

  git rev-parse --verify -q "$rev"

does currently work. I added a test here to confirm that we didn't break
that.

A few implementation notes:

 - We don't document --end-of-options explicitly in commands, but rather
   in gitcli(7). So I didn't give it its own section in git-rev-parse(1).
   But I did call it out specifically in the --verify section, and
   include it in the examples, which should show best practices.

 - We don't have to re-indent the main option-parsing block, because we
   can combine our "did we see end of options" check with "does it start
   with a dash". The exception is the pre-setup options, which need
   their own block.

 - We do however have to pull the "--" parsing out of the "does it start
   with dash" block, because we want to parse it even if we've seen
   --end-of-options.

 - We'll leave "--end-of-options" in the output. This is probably not
   technically necessary, as a careful caller will do:

     git rev-parse --end-of-options $revs -- $paths

   and anything in $revs will be resolved to an object id. However, it
   does help a slightly less careful caller like:

     git rev-parse --end-of-options $revs_or_paths

   where a path "--foo" will remain in the output as long as it also
   exists on disk. In that case, it's helpful to retain --end-of-options
   to get passed along to rev-list, s it would otherwise see just
   "--foo".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-10 13:46:27 -08:00
Jeff King 9033addfa6 rev-parse: put all options under the "-" check
The option-parsing loop of rev-parse checks whether the first character
of an arg is "-". If so, then it enters a series of conditionals
checking for individual options. But some options are inexplicably
outside of that outer conditional.

This doesn't produce the wrong behavior; the conditional is actually
redundant with the individual option checks, and it's really only its
fallback "continue" that we care about. But we should at least be
consistent.

One obvious alternative is that we could get rid of the conditional
entirely. But we'll be using the extra block it provides in the next
patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-10 13:46:27 -08:00
Jeff King e05e2ae8fe rev-parse: don't accept options after dashdash
Because of the order in which we check options in rev-parse, there are a
few options we accept even after a "--". This is wrong, because the
whole point of "--" is to say "everything after here is a path". Let's
move the "did we see a dashdash" check (it's called "as_is" in the code)
to the top of the parsing loop.

Note there is one subtlety here. The options are ordered so that some
are checked before we even see if we're in a repository (they continue
the loop, and if we get past a certain point, then we do the repository
setup). By moving the as_is check higher, it's also in that "before
setup" section, even though it might look at the repository via
verify_filename(). However, this works out: we'd never set as_is until
we parse "--", and we don't parse that until after doing the setup.

An alternative here to avoid the subtlety is to put the as_is check at
the top of the post-setup options. But then every pre-setup option would
have to remember to check "if (!as_is && !strcmp(...))". So while this
is a bit magical, it's harder for future code to get wrong.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-10 13:46:27 -08:00
Junio C Hamano 3baf58bfb4 format-patch: make output filename configurable
For the past 15 years, we've used the hardcoded 64 as the length
limit of the filename of the output from the "git format-patch"
command.  Since the value is shorter than the 80-column terminal, it
could grow without line wrapping a bit.  At the same time, since the
value is longer than half of the 80-column terminal, we could fit
two or more of them in "ls" output on such a terminal if we allowed
to lower it.

Introduce a new command line option --filename-max-length=<n> and a
new configuration variable format.filenameMaxLength to override the
hardcoded default.

While we are at it, remove a check that the name of output directory
does not exceed PATH_MAX---this check is pointless in that by the
time control reaches the function, the caller would already have
done an equivalent of "mkdir -p", so if the system does not like an
overly long directory name, the control wouldn't have reached here,
and otherwise, we know that the system allowed the output directory
to exist.  In the worst case, we will get an error when we try to
open the output file and handle the error correctly anyway.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-09 17:44:41 -08:00
Junio C Hamano 6a44c9c0d0 Merge branch 'jk/committer-date-is-author-date-fix-simplify'
Code simplification.

* jk/committer-date-is-author-date-fix-simplify:
  am, sequencer: stop parsing our own committer ident
2020-11-09 14:06:28 -08:00
Junio C Hamano ecf95d938b Merge branch 'ab/git-remote-exit-code'
Exit codes from "git remote add" etc. were not usable by scripted
callers.

* ab/git-remote-exit-code:
  remote: add meaningful exit code on missing/existing
2020-11-09 14:06:26 -08:00
Junio C Hamano 92d6bd2e90 Merge branch 'jk/checkout-index-errors'
"git checkout-index" did not consistently signal an error with its
exit status.

* jk/checkout-index-errors:
  checkout-index: propagate errors to exit code
  checkout-index: drop error message from empty --stage=all
2020-11-09 14:06:26 -08:00
Junio C Hamano cfdc70b299 Merge branch 'mr/bisect-in-c-3'
Rewriting "git bisect" in C continues.

* mr/bisect-in-c-3:
  bisect--helper: retire `--bisect-autostart` subcommand
  bisect--helper: retire `--write-terms` subcommand
  bisect--helper: retire `--check-expected-revs` subcommand
  bisect--helper: reimplement `bisect_state` & `bisect_head` shell functions in C
  bisect--helper: retire `--next-all` subcommand
  bisect--helper: retire `--bisect-clean-state` subcommand
  bisect--helper: finish porting `bisect_start()` to C
2020-11-09 14:06:25 -08:00
Phillip Wood 8843302307 rebase -i: simplify get_revision_ranges()
Now that all the external users of head_hash have been converted to
use a opts->orig_head instead we can stop returning head_hash from
get_revision_ranges().

Because we want to pass the full object names back to the caller in
`revisions` the find_unique_abbrev_r() call that was used to initialize
`head_hash` is replaced with oid_to_hex().

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:10:41 -08:00
Phillip Wood a2bb10d06d rebase -i: use struct object_id when writing state
Rather than passing a string around pass the struct object_id that the
string was created from call oid_hex() when we write the file.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:10:41 -08:00
Phillip Wood f3e27a02d5 rebase -i: use struct object_id rather than looking up commit
We already have a struct object_id containing the oid that we want to
set ORIG_HEAD to so use that rather than converting it to a string and
then calling get_oid() on that string.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:10:41 -08:00
Phillip Wood e100bea481 rebase -i: stop overwriting ORIG_HEAD buffer
After rebasing, ORIG_HEAD is supposed to point to the old HEAD of the
rebased branch.  The code used find_unique_abbrev() to obtain the
object name of the old HEAD and wrote to both
.git/rebase-merge/orig-head (used by `rebase --abort` to go back to
the previous state) and to ORIG_HEAD.  The buffer find_unique_abbrev()
gives back is volatile, unfortunately, and was overwritten after the
former file is written but before ORIG_FILE is written, leaving an
incorrect object name in it.

Avoid relying on the volatile buffer of find_unique_abbrev(), and
instead supply our own buffer to keep the object name.

I think that all of the users of head_hash should actually be using
opts->orig_head instead as passing a string rather than a struct
object_id around is a hang over from the scripted implementation. This
patch just fixes the immediate bug and adds a regression test based on
Caspar's reproduction example[1]. The users will be converted to use
struct object_id and head_hash removed in the next few commits.

[1] https://lore.kernel.org/git/CAFzd1+7PDg2PZgKw7U0kdepdYuoML9wSN4kofmB_-8NHrbbrHg@mail.gmail.com

Reported-by: Caspar Duregger <herr.kaste@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:10:41 -08:00
Jeff King dc1672dd10 format-patch: support --output option
We've never intended to support diff's --output option in format-patch.
And until baa4adc66a (parse-options: disable option abbreviation with
PARSE_OPT_KEEP_UNKNOWN, 2019-01-27), it was impossible to trigger. We
first parse the format-patch options before handing the remainder off to
setup_revisions(). Before that commit, we'd accept "--output=foo" as an
abbreviation for "--output-directory=foo". But afterwards, we don't
check abbreviations, and --output gets passed to the diff code.

This results in nonsense behavior and bugs. The diff code will have
opened a filehandle at rev.diffopt.file, but we'll overwrite that with
our own handles that we open for each individual patch file. So the
--output file will always just be empty. But worse, the diff code also
sets rev.diffopt.close_file, so log_tree_commit() will close the
filehandle itself. And then the main loop in cmd_format_patch() will try
to close it again, resulting in a double-free.

The simplest solution would be to just disallow --output with
format-patch, as nobody ever intended it to work. However, we have
accidentally documented it (because format-patch includes diff-options).
And it does work with "git log", which writes the whole output to the
specified file. It's easy enough to make that work for format-patch,
too: it's really the same as --stdout, but pointed at a specific file.

We can detect the use of the --output option by the "close_file" flag
(note that we can't use rev.diffopt.file, since the diff setup will
otherwise set it to stdout). So we just need to unset that flag, but
don't have to do anything else. Our situation is otherwise exactly like
--stdout (note that we don't fclose() the file, but nor does the stdout
case; exiting the program takes care of that for us).

Reported-by: Johannes Postler <johannes.postler@txture.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:05:29 -08:00
Jeff King 1e1693b2bb format-patch: tie file-opening logic to output_directory
In format-patch we're either outputting to stdout or to individual files
in an output directory (which may be just "./"). Our logic for whether
to open a new file for each patch is checked with "!use_stdout", but it
is equally correct to check for a non-NULL output_directory.

The distinction will matter when we add a new single-stream output in a
future patch, when only one of the three methods will want individual
files. Let's swap the logic here in preparation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:05:28 -08:00
Jeff King 4c6f781f9c format-patch: refactor output selection
The --stdout and --output-directory options are mutually exclusive, but
it's hard to tell from reading the code. We have three separate
conditionals that check for use_stdout, and it's only after we've set up
the output_directory fully that we check whether the user also specified
--stdout.

Instead, let's check the exclusion explicitly first, then have a single
conditional that handles stdout versus an output directory. This is
slightly easier to follow now, and also will keep things sane when we
add another output mode in a future patch.

We'll add a few tests as well, covering the mutual exclusion and the
fact that we are not confused by a configured output directory.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 14:05:28 -08:00
Junio C Hamano 39664cb0ac log: diagnose -L used with pathspec as an error
The -L option is documented to accept no pathspec, but the
command line option parser has allowed the combination without
checking so far.  Ensure that there is no pathspec when the -L
option is in effect to fix this.

Incidentally, this change fixes another bug in the command line
option parser, which has allowed the -L option used together
with the --follow option.  Because the latter requires exactly
one path given, but the former takes no pathspec, they become
mutually incompatible automatically.  Because the -L option
follows renames on its own, there is no reason to give --follow
at the same time.

The new tests say they may fail with "-L and --follow being
incompatible" instead of "-L and pathspec being incompatible".
Currently the expected failure can come only from the latter, but
this is to futureproof them, in case we decide to add code to
explicititly die on -L and --follow used together.

Heled-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-04 13:38:33 -08:00
Elijah Newren 14c4586c2d merge,rebase,revert: select ort or recursive by config or environment
Allow the testsuite to run where it treats requests for "recursive" or
the default merge algorithm via consulting the environment variable
GIT_TEST_MERGE_ALGORITHM which is expected to either be "recursive" (the
old traditional algorithm) or "ort" (the new algorithm).

Also, allow folks to pick the new algorithm via config setting.  It
turns out builtin/merge.c already had a way to allow users to specify a
different default merge algorithm: pull.twohead.  Rather odd
configuration name (especially to be in the 'pull' namespace rather than
'merge') but it's there.  Add that same configuration to rebase,
cherry-pick, and revert.

This required updating the various callsites that called merge_trees()
or merge_recursive() to conditionally call the new API, so this serves
as another demonstration of what the new API looks and feels like.
There are almost certainly some callsites that have not yet been
modified to work with the new merge algorithm, but this represents the
ones that I have been testing with thus far.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 16:35:50 -08:00
Junio C Hamano 1ae0949a03 Merge branch 'mk/diff-ignore-regex'
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.

* mk/diff-ignore-regex:
  diff: add -I<regex> that ignores matching changes
  merge-base, xdiff: zero out xpparam_t structures
2020-11-02 13:17:44 -08:00
Junio C Hamano 51830654fc Merge branch 'jk/fast-import-marks-cleanup'
Code clean-up.

* jk/fast-import-marks-cleanup:
  fast-import: remove duplicated option-parsing line
2020-11-02 13:17:40 -08:00
Junio C Hamano e0f6ad2984 Merge branch 'tk/credential-config'
"git credential' didn't honor the core.askPass configuration
variable (among other things), which has been corrected.

* tk/credential-config:
  credential: load default config
2020-11-02 13:17:39 -08:00
Junio C Hamano b6fb70c985 Merge branch 'dl/diff-merge-base'
"git diff A...B" learned "git diff --merge-base A B", which is a
longer short-hand to say the same thing.

* dl/diff-merge-base:
  contrib/completion: complete `git diff --merge-base`
  builtin/diff-tree: learn --merge-base
  builtin/diff-index: learn --merge-base
  t4068: add --merge-base tests
  diff-lib: define diff_get_merge_base()
  diff-lib: accept option flags in run_diff_index()
  contrib/completion: extract common diff/difftool options
  git-diff.txt: backtick quote command text
  git-diff-index.txt: make --cached description a proper sentence
  t4068: remove unnecessary >tmp
2020-11-02 13:17:39 -08:00
Junio C Hamano 761a4e9ab1 Merge branch 'bk/sob-dco'
Document that the meaning of a Signed-off-by trailer can vary from
project to project in the end-user documentation, and clarify what
it means to this project.

* bk/sob-dco:
  Documentation: stylistically normalize references to Signed-off-by:
  SubmittingPatches: clarify DCO is our --signoff rule
  Documentation: clarify and expand description of --signoff
  doc: preparatory clean-up of description on the sign-off option
2020-11-02 13:17:39 -08:00
Junio C Hamano 0be2d65132 Merge branch 'ds/maintenance-commit-graph-auto-fix'
Test-coverage enhancement of running commit-graph task "git
maintenance" as needed led to discovery and fix of a bug.

* ds/maintenance-commit-graph-auto-fix:
  maintenance: core.commitGraph=false prevents writes
  maintenance: test commit-graph auto condition
2020-11-02 13:17:39 -08:00
Junio C Hamano cd47bbe164 Merge branch 'jk/fast-import-marks-alloc-fix'
"git fast-import" wasted a lot of memory when many marks were in use.

* jk/fast-import-marks-alloc-fix:
  fast-import: fix over-allocation of marks storage
2020-11-02 13:17:37 -08:00
Elijah Newren 6da1a25814 hashmap: provide deallocation function names
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 12:15:50 -08:00
Philippe Blain 3af31e8786 blame: simplify 'setup_blame_bloom_data' interface
The penultimate commit moved the initialization of 'sb.path' in
'builtin/blame.c::cmd_blame' before the call to
'blame.c::setup_blame_bloom_data'. Since 'cmd_blame' is the only caller
of 'setup_blame_bloom_data', it is now unnecessary for
'setup_blame_bloom_data' to receive 'path' as a separate argument, as
'sb.path' is already initialized.

Remove this argument from setup_blame_bloom_data's interface and use the
'path' field of the 'sb' 'struct blame_scoreboard' instead.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-01 15:54:15 -08:00
Philippe Blain 88894aaeea blame: simplify 'setup_scoreboard' interface
The previous commit moved the initialization of 'sb.path' in
'builtin/blame.c::cmd_blame' before the call to
'blame.c::setup_scoreboard'. Since 'cmd_blame' is the only caller of
'setup_scoreboard', it is now unnecessary for 'setup_scoreboard' to
receive 'path' as a separate argument, as 'sb.path' is already
initialized.

Remove this argument from setup_scoreboard's interface and use the
'path' field of the 'sb' 'struct blame_scoreboard' instead.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-01 15:54:15 -08:00
Philippe Blain 9466e3809d blame: enable funcname blaming with userdiff driver
In blame.c::cmd_blame, we send the 'path' field of the 'sb' 'struct
blame_scoreboard' as the 'path' argument to
'line-range.c::parse_range_arg', but 'sb.path' is not set yet; it's set
to the local variable 'path' a few lines later at line 1137.

This 'path' argument is only used in 'parse_range_arg' if we are blaming
a funcname, i.e. `git blame -L :<funcname> <path>`, and in that case it
is sent to 'parse_range_funcname', where it is used to determine if a
userdiff driver should be used for said <path> to match the given
funcname.

Since 'path' is yet unset, the userdiff driver is never used, so we fall
back to the default funcname regex, which is usually not appropriate for
paths that are set to use a specific userdiff driver, and thus either we
match some unrelated lines, or we die with

    fatal: -L parameter '<funcname>' starting at line 1: no match

This has been the case ever since `git blame` learned to blame a
funcname in 13b8f68c1f (log -L: :pattern:file syntax to find by
funcname, 2013-03-28).

Enable funcname blaming for paths using specific userdiff drivers by
initializing 'sb.path' earlier in 'cmd_blame', when some of its other
fields are initialized, so that it is set when passed to
'parse_range_arg'.

Add a regression test in 'annotate-tests.sh', which is sourced in
t8001-annotate.sh and t8002-blame.sh, leveraging an existing file used
to test the userdiff patterns in t4018-diff-funcname.

Also, use 'sb.path' instead of 'path' when constructing the error
message at line 1114, for consistency.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-01 15:54:15 -08:00
Philippe Blain 180d641d7d line-log: mention both modes in 'blame' and 'log' short help
'git blame -h' and 'git log -h' both show '-L <n,m>' and describe this
option as "Process only line range n,m, counting from 1". No hint is
given that a function name regex can also be used.

Use <range> instead, and expand the description of the option to mention
both modes. Remove "counting from 1" as it's uneeded; it's uncommon to
refer to the first line of a file as "line 0".

Also, for 'git log', improve the wording to better reflect the long help.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-01 15:54:14 -08:00
René Scharfe 4f44c5659b stash: simplify reflog emptiness check
Calling rev-parse to check if the drop subcommand removed the last stash
and treating its failure as confirmation is fragile, as the command can
fail for other reasons, e.g. because the system is out of memory.
Directly check if the reflog is empty instead, which is more robust.

Reported-by: Marek Mrva <mrva@eof-studios.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-01 15:51:31 -08:00
René Scharfe cd8888452c object: allow clear_commit_marks_all to handle any repo
Allow callers to specify the repository to use.  Rename the function to
repo_clear_commit_marks to document its new scope.  No functional change
intended.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-31 10:46:34 -07:00
Junio C Hamano 4f9f7c1442 Merge branch 'jk/committer-date-is-author-date-fix' into maint
In 2.29, "--committer-date-is-author-date" option of "rebase" and
"am" subcommands lost the e-mail address by mistake, which has been
corrected.

* jk/committer-date-is-author-date-fix:
  rebase: fix broken email with --committer-date-is-author-date
  am: fix broken email with --committer-date-is-author-date
  t3436: check --committer-date-is-author-date result more carefully
2020-10-29 14:18:47 -07:00
Junio C Hamano 0e41cfad62 Merge branch 'dl/checkout-guess'
"git checkout" learned to use checkout.guess configuration variable
and enable/disable its "--[no-]guess" option accordingly.

* dl/checkout-guess:
  checkout: learn to respect checkout.guess
  Documentation/config/checkout: replace sq with backticks
2020-10-27 15:09:51 -07:00
Junio C Hamano f3cfeb3078 Merge branch 'dl/checkout-p-merge-base'
"git checkout -p A...B [-- <path>]" did not work, even though the
same command without "-p" correctly used the merge-base between
commits A and B.

* dl/checkout-p-merge-base:
  t2016: add a NEEDSWORK about the PERL prerequisite
  add-patch: add NEEDSWORK about comparing commits
  Doc: document "A...B" form for <tree-ish> in checkout and switch
  builtin/checkout: fix `git checkout -p HEAD...` bug
2020-10-27 15:09:51 -07:00
Junio C Hamano 40696c6727 Merge branch 'sb/clone-origin'
"git clone" learned clone.defaultremotename configuration variable
to customize what nickname to use to call the remote the repository
was cloned from.

* sb/clone-origin:
  clone: allow configurable default for `-o`/`--origin`
  clone: read new remote name from remote_name instead of option_origin
  clone: validate --origin option before use
  refs: consolidate remote name validation
  remote: add tests for add and rename with invalid names
  clone: use more conventional config/option layering
  clone: add tests for --template and some disallowed option pairs
2020-10-27 15:09:50 -07:00
Junio C Hamano de0a7effc8 Merge branch 'sk/force-if-includes'
"git push --force-with-lease[=<ref>]" can easily be misused to lose
commits unless the user takes good care of their own "git fetch".
A new option "--force-if-includes" attempts to ensure that what is
being force-pushed was created after examining the commit at the
tip of the remote ref that is about to be force-replaced.

* sk/force-if-includes:
  t, doc: update tests, reference for "--force-if-includes"
  push: parse and set flag for "--force-if-includes"
  push: add reflog check for "--force-if-includes"
2020-10-27 15:09:49 -07:00
Junio C Hamano 52b8c8c716 Merge branch 'ds/maintenance-part-2'
"git maintenance", an extended big brother of "git gc", continues
to evolve.

* ds/maintenance-part-2:
  maintenance: add incremental-repack auto condition
  maintenance: auto-size incremental-repack batch
  maintenance: add incremental-repack task
  midx: use start_delayed_progress()
  midx: enable core.multiPackIndex by default
  maintenance: create auto condition for loose-objects
  maintenance: add loose-objects task
  maintenance: add prefetch task
2020-10-27 15:09:47 -07:00
Junio C Hamano 26bb5437f6 Merge branch 'rs/worktree-list-show-locked'
"git worktree list" now shows if each worktree is locked.  This
possibly may open us to show other kinds of states in the future.

* rs/worktree-list-show-locked:
  worktree: teach `list` to annotate locked worktree
2020-10-27 15:09:47 -07:00
Junio C Hamano ae84e924da Merge branch 'rs/tighten-callers-of-deref-tag'
Code clean-up.

* rs/tighten-callers-of-deref-tag:
  line-log: handle deref_tag() returning NULL
  blame: handle deref_tag() returning NULL
  grep: handle deref_tag() returning NULL
2020-10-27 15:09:46 -07:00
Jeff King 7e41061588 checkout-index: propagate errors to exit code
If we encounter an error while checking out an explicit path, we print a
message to stderr but do not actually exit with a non-zero code. While
this is a plumbing command and the behavior goes all the way back to
33db5f4d90 (Add a "checkout-cache" command which does what the name
suggests., 2005-04-09), this is almost certainly an oversight:

  - we _do_ return an exit code from checkout_file(); the caller just
    never reads it

  - errors while checking out all paths (with "-a") do result in a
    non-zero exit code.

  - it would be quite unusual not to use the exit code for an error,
    as otherwise the caller has no idea the command failed except by
    scraping stderr

To keep our tests simple and portable, we can use the most obvious
error: asking to checkout a path which is not in the index at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-27 12:41:56 -07:00
Jeff King 0b809c8248 checkout-index: drop error message from empty --stage=all
If checkout-index is given --stage=all for a specific path, it will try
to write stages 1-3 (if present) for that path to temporary files.
However, if the file is present only at stage 0, it writes nothing but
gives a confusing message:

  $ git checkout-index --stage=all -- Makefile
  git checkout-index: Makefile does not exist at stage 4

This is nonsense. There is no stage 4 (it's just an internal enum value
we use for "all"), and the documentation clearly states:

  Paths which only have a stage 0 entry will always be omitted from the
  output.

Here it's talking about the list of tempfiles written to stdout, but it
seems clear that this case was not meant to be an error. We even have a
test which covers it, but it only checks that the command reports an
exit code of 0, not its stderr. And it reports 0 only because of another
bug which fails to propagate errors (which will be fixed in a subsequent
patch).

So let's make the test more thorough. We'll also cover the case that we
found _no_ entry, not even a stage zero, which should still be an error.
However, because of the other bug, we'll have to mark this as expecting
failure for the moment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-27 12:41:54 -07:00
Ævar Arnfjörð Bjarmason 9144ba4cf5 remote: add meaningful exit code on missing/existing
Change the exit code for the likes of "git remote add/rename" to exit
with 2 if the remote in question doesn't exist, and 3 if it
does. Before we'd just die() and exit with the general 128 exit code.

This changes the output message from e.g.:

    fatal: remote origin already exists.

To:

    error: remote origin already exists.

Which I believe is a feature, since we generally use "fatal" for the
generic errors, and "error" for the more specific ones with a custom
exit code, but this part of the change may break code that already
relies on stderr parsing (not that we ever supported that...).

The motivation for this is a discussion around some code in GitLab's
gitaly which wanted to check this, and had to parse stderr to do so:
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2695

It's worth noting as an aside that a method of checking this that
doesn't rely on that is to check with "git config" whether the value
in question does or doesn't exist. That introduces a TOCTOU race
condition, but on the other hand this code (e.g. "git remote add")
already has a TOCTOU race.

We go through the config.lock for the actual setting of the config,
but the pseudocode logic is:

    read_config();
    check_config_and_arg_sanity();
    save_config();

So e.g. if a sleep() is added right after the remote_is_configured()
check in add() we'll clobber remote.NAME.url, and add another (usually
duplicate) remote.NAME.fetch entry (and other values, depending on
invocation).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-27 11:40:33 -07:00
Junio C Hamano f34687dc81 Merge branch 'jk/committer-date-is-author-date-fix'
In 2.29, "--committer-date-is-author-date" option of "rebase" and
"am" subcommands lost the e-mail address by mistake, which has been
corrected.

* jk/committer-date-is-author-date-fix:
  rebase: fix broken email with --committer-date-is-author-date
  am: fix broken email with --committer-date-is-author-date
  t3436: check --committer-date-is-author-date result more carefully
2020-10-26 14:59:58 -07:00
Jeff King 2020451c5b am, sequencer: stop parsing our own committer ident
For the --committer-date-is-author-date option of git-am and git-rebase,
we format the committer ident, then re-parse it to find the name and
email, and then feed those back to fmt_ident().

We can simplify this by handling it all at the time of the fmt_ident()
call. We pass in the appropriate getenv() results, and if they're not
present, then our WANT_COMMITTER_IDENT flag tells fmt_ident() to fill in
the appropriate value from the config. Which is exactly what
git_committer_ident() was doing under the hood.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-26 09:59:57 -07:00
Jeff King 16b0bb99ea am: fix broken email with --committer-date-is-author-date
Commit e8cbe2118a (am: stop exporting GIT_COMMITTER_DATE, 2020-08-17)
rewrote the code for setting the committer date to use fmt_ident(),
rather than setting an environment variable and letting commit_tree()
handle it. But it introduced two bugs:

  - we use the author email string instead of the committer email

  - when parsing the committer ident, we used the wrong variable to
    compute the length of the email, resulting in it always being a
    zero-length string

This commit fixes both, which causes our test of this option via the
rebase "apply" backend to now succeed.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-23 08:25:19 -07:00
Michał Kępień ec7967cfaf merge-base, xdiff: zero out xpparam_t structures
xpparam_t structures are usually zero-initialized before their specific
fields are assigned to, but there are three locations in the tree where
that does not happen.  Add the missing memset() calls in order to make
initialization of xpparam_t structures consistent tree-wide and to
prevent stack garbage from being used as field values.

Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:53:26 -07:00
Bradley M. Kuhn 3abd4a67d9 Documentation: stylistically normalize references to Signed-off-by:
Ted reported an old typo in the git-commit.txt and merge-options.txt.
Namely, the phrase "Signed-off-by line" was used without either a
definite nor indefinite article.

Upon examination, it seems that the documentation (including items in
Documentation/, but also option help strings) have been quite
inconsistent on usage when referring to `Signed-off-by`.

First, very few places used a definite or indefinite article with the
phrase "Signed-off-by line", but that was the initial typo that led
to this investigation.  So, normalize using either an indefinite or
definite article consistently.

The original phrasing, in Commit 3f971fc425 (Documentation updates,
2005-08-14), is "Add Signed-off-by line".  Commit 6f855371a5 (Add
--signoff, --check, and long option-names. 2005-12-09) switched to
using "Add `Signed-off-by:` line", but didn't normalize the former
commit to match.  Later commits seem to have cut and pasted from one
or the other, which is likely how the usage became so inconsistent.

Junio stated on the git mailing list in
<xmqqy2k1dfoh.fsf@gitster.c.googlers.com> a preference to leave off
the colon.  Thus, prefer `Signed-off-by` (with backticks) for the
documentation files and Signed-off-by (without backticks) for option
help strings.

Additionally, Junio argued that "trailer" is now the standard term to
refer to `Signed-off-by`, saying that "becomes plenty clear that we
are not talking about any random line in the log message".  As such,
prefer "trailer" over "line" anywhere the former word fits.

However, leave alone those few places in documentation that use
Signed-off-by to refer to the process (rather than the specific
trailer), or in places where mail headers are generally discussed in
comparison with Signed-off-by.

Reported-by: "Theodore Y. Ts'o" <tytso@mit.edu>
Signed-off-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 11:57:40 -07:00
Thomas Koutcher 567ad2c0f9 credential: load default config
Make `git credential fill` honour the core.askPass variable.

Signed-off-by: Thomas Koutcher <thomas.koutcher@online.fr>
[jk: added test]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:30:45 -07:00
Pranit Bauva b0f6494f70 bisect--helper: retire --bisect-autostart subcommand
The `--bisect-autostart` subcommand is no longer used from the
git-bisect.sh shell script. Instead the function
`bisect_autostart()` is directly called from the C implementation.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:24:20 -07:00
Pranit Bauva 5c517fe345 bisect--helper: retire --write-terms subcommand
The `--write-terms` subcommand is no longer used from the
git-bisect.sh shell script. Instead the function `write_terms()`
is called from the C implementation of `set_terms()` and
`bisect_start()`.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:24:20 -07:00
Pranit Bauva 9b437b056d bisect--helper: retire --check-expected-revs subcommand
The `--check-expected-revs` subcommand is no longer used from the
git-bisect.sh shell script. Functions `check_expected_revs` and
`is_expected_revs` are also deleted.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:24:20 -07:00
Pranit Bauva 27257bc466 bisect--helper: reimplement bisect_state & bisect_head shell functions in C
Reimplement the `bisect_state()` shell functions in C and also add a
subcommand `--bisect-state` to `git-bisect--helper` to call them from
git-bisect.sh .

Using `--bisect-state` subcommand is a temporary measure to port shell
function to C so as to use the existing test suite. As more functions
are ported, this subcommand will be retired and will be called by some
other methods.

`bisect_head()` is only called from `bisect_state()`, thus it is not
required to introduce another subcommand.

Note that the `eval` in the changed line of `git-bisect.sh` cannot be
dropped: it is necessary because the `rev` and the `tail`
variables may contain multiple, quoted arguments that need to be
passed to `bisect--helper` (without the quotes, naturally).

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:24:20 -07:00
Pranit Bauva 04774b4e70 bisect--helper: retire --next-all subcommand
The `--next-all` subcommand is no longer used from the git-bisect.sh
shell script. Instead the function `bisect_next_all()` is called from
the C implementation of `bisect_next()`.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:24:20 -07:00
Pranit Bauva e4396072e7 bisect--helper: retire --bisect-clean-state subcommand
The `--bisect-clean-state` subcommand is no longer used from the
git-bisect.sh shell script. Instead the function
`bisect_clean_state()` is directly called from the C
implementation.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:24:20 -07:00
Pranit Bauva 88ad372fc0 bisect--helper: finish porting bisect_start() to C
Add the subcommand to `git bisect--helper` and call it from
git-bisect.sh.

With the conversion of `bisect_auto_next()` from shell to C in a
previous commit, `bisect_start()` can now be fully ported to C.

So let's complete the `--bisect-start` subcommand of
`git bisect--helper` so that it fully implements `bisect_start()`,
and let's use this subcommand in `git-bisect.sh` instead of
`bisect_start()`.

Note that the `eval` in the changed line of `git-bisect.sh` cannot be
dropped: it is necessary because the `rev` and the `tail`
variables may contain multiple, quoted arguments that need to be
passed to `bisect--helper` (without the quotes, naturally).

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 12:24:20 -07:00
Jeff King 6db29ab213 fast-import: remove duplicated option-parsing line
Commit 1bdca81641 (fast-import: add options for rewriting submodules,
2020-02-22) accidentally added two lines parsing the option
"rewrite-submodules-from". This didn't do anything in practice, because
they're in an if/else chain and so the second one can never trigger.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 08:48:47 -07:00
Derrick Stolee 61f7a383d3 maintenance: use 'incremental' strategy by default
The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.

To make it simple for users to start background maintenance with a
recommended schedlue, update the 'maintenance.strategy' config option in
both the 'register' and 'start' subcommands. This allows users to
customize beyond the defaults using individual
'maintenance.<task>.schedule' options, but also the user can opt-out of
this strategy using 'maintenance.strategy=none'.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 08:36:42 -07:00
Derrick Stolee a4cb1a2339 maintenance: create maintenance.strategy config
To provide an on-ramp for users to use background maintenance without
several 'git config' commands, create a 'maintenance.strategy' config
option. Currently, the only important value is 'incremental' which
assigns the following schedule:

* gc: never
* prefetch: hourly
* commit-graph: hourly
* loose-objects: daily
* incremental-repack: daily

These tasks are chosen to minimize disruptions to foreground Git
commands and use few compute resources.

The 'maintenance.strategy' is intended as a baseline that can be
customzied further by manually assigning 'maintenance.<task>.enabled'
and 'maintenance.<task>.schedule' config options, which will override
any recommendation from 'maintenance.strategy'. This operates similarly
to config options like 'feature.experimental' which operate as "meta"
config options that change default config values.

This presents a way forward for updating the 'incremental' strategy in
the future or adding new strategies. For example, a potential strategy
could be to include a 'full' strategy that runs the 'gc' task weekly
and no other tasks by default.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-16 08:36:42 -07:00
Jeff King 3f018ec716 fast-import: fix over-allocation of marks storage
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.

This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:

  - we added a call in option_rewrite_submodules() which uses a separate
    mark set; pushing down on "marks" is outright wrong here. We'd
    corrupt the "marks" set, and we'd fail to correctly store any
    submodule mappings with an id over 1024.

  - the other callers passed "marks", but the push-down was still wrong.
    In read_mark_file(), we take the pointer to the mark_set as a
    parameter. So even though insert_mark() was updating the global
    "marks", the local pointer we had in read_mark_file() was not
    updated. As a result, we'd add a new level when needed, but then the
    next call to insert_mark() wouldn't see it! It would then allocate a
    new layer, which would also not be seen, and so on. Lookups for the
    lost layers obviously wouldn't work, but before we even hit any
    lookup stage, we'd generally run out of memory and die.

Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).

We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.

Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-15 10:30:53 -07:00
René Scharfe db7d07f610 blame: handle deref_tag() returning NULL
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-12 12:25:14 -07:00
René Scharfe e30b1525fb grep: handle deref_tag() returning NULL
deref_tag() can return NULL.  Exit gracefully in that case instead
of blindly dereferencing the return value.

.name shouldn't ever be NULL, but grep_object() handles that case
explicitly, so let's be defensive here as well and show the broken
object's ID if it happens to lack a name after all.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-12 12:25:14 -07:00
Rafael Silva c57b3367be worktree: teach list to annotate locked worktree
The "git worktree list" shows the absolute path to the working tree,
the commit that is checked out and the name of the branch. It is not
immediately obvious which of the worktrees, if any, are locked.

"git worktree remove" refuses to remove a locked worktree with
an error message. If "git worktree list" told which worktrees
are locked in its output, the user would not even attempt to
remove such a worktree, or would realize that
"git worktree remove -f -f <path>" is required.

Teach "git worktree list" to append "locked" to its output.
The output from the command becomes like so:

    $ git worktree list
    /path/to/main             abc123 [master]
    /path/to/worktree         456def (detached HEAD)
    /path/to/locked-worktree  123abc (detached HEAD) locked

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-12 12:24:29 -07:00
Derrick Stolee d334107c5d maintenance: core.commitGraph=false prevents writes
Recently, a user had an issue due to combining
fetch.writeCommitGraph=true with core.commitGraph=false. The root bug
has been resolved by preventing commit-graph writes when
core.commitGraph is disabled. This happens inside the 'git commit-graph
write' command, but we can be more aware of this situation and prevent
that process from ever starting in the 'commit-graph' maintenance task.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-12 12:13:21 -07:00
Junio C Hamano d620daaa34 Merge branch 'ja/misc-doc-fixes'
Doc fixes.

* ja/misc-doc-fixes:
  doc: fix the bnf like style of some commands
  doc: git-remote fix ups
  doc: use linkgit macro where needed.
  git-bisect-lk2009: make continuation of list indented
2020-10-08 21:53:26 -07:00
Junio C Hamano c7ac8c0a7c Merge branch 'jk/index-pack-hotfixes'
Hotfix and clean-up for the jt/threaded-index-pack topic that has
graduated to v2.29-rc0.

* jk/index-pack-hotfixes:
  index-pack: make get_base_data() comment clearer
  index-pack: drop type_cas mutex
  index-pack: restore "resolving deltas" progress meter
2020-10-08 21:53:26 -07:00
Jean-Noël Avila 9f443f5531 doc: fix the bnf like style of some commands
In command line options, variables are entered between < and >

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08 14:01:19 -07:00
Derrick Stolee 8f801804be maintenance: test commit-graph auto condition
The auto condition for the commit-graph maintenance task walks refs
looking for commits that are not in the commit-graph file. This was
added in 4ddc79b2 (maintenance: add auto condition for commit-graph
task, 2020-09-17) but was left untested.

The initial goal of this change was to demonstrate the feature works
properly by adding tests. However, there was an off-by-one error that
caused the basic tests around maintenance.commit-graph.auto=1 to fail
when it should work.

The subtlety is that if a ref tip is not in the commit-graph, then we
were not adding that to the total count. In the test, we see that we
have only added one commit since our last commit-graph write, so the
auto condition would say there is nothing to do.

The fix is simple: add the check for the commit-graph position to see
that the tip is not in the commit-graph file before starting our walk.
Since this happens before adding to the DFS stack, we do not need to
clear our (currently empty) commit list.

This does add some extra complexity for the test, because we also want
to verify that the walk along the parents actually does some work. This
means we need to add at least two commits in a row without writing the
commit-graph. However, we also need to make sure no additional refs are
pointing to the middle of this list or else the for_each_ref() in
should_write_commit_graph() might visit these commits as tips instead of
doing a DFS walk. Hence, the last two commits are added with "git
commit" instead of "test_commit".

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08 10:24:40 -07:00
Denton Liu 64f1f58fe7 checkout: learn to respect checkout.guess
The current behavior of git checkout/switch is that --guess is currently
enabled by default. However, some users may not wish for this to happen
automatically. Instead of forcing users to specify --no-guess manually
each time, teach these commands the checkout.guess configuration
variable that gives users the option to set a default behavior.

Teach the completion script to recognize the new config variable and
disable DWIM logic if it is set to false.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-08 09:25:29 -07:00
Jonathan Tan ec6a8f9705 index-pack: make get_base_data() comment clearer
A comment mentions that we may free cached delta bases via
find_unresolved_deltas(), but that function went away in f08cbf60fe
(index-pack: make quantum of work smaller, 2020-09-08). Since we need to
rewrite that comment anyway, make the entire comment clearer.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 13:32:27 -07:00
Jeff King bebe171947 index-pack: drop type_cas mutex
The type_cas lock lost all of its callers in f08cbf60fe (index-pack:
make quantum of work smaller, 2020-09-08), so we can safely delete it.
The compiler didn't alert us that the variable became unused, because we
still call pthread_mutex_init() and pthread_mutex_destroy() on it.

It's worth considering also whether that commit was in error to remove
the use of the lock. Why don't we need it now, if we did before, as
described in ab791dd138 (index-pack: fix race condition with duplicate
bases, 2014-08-29)? I think the answer is that we now look at and assign
the child_obj->real_type field in the main thread while holding the
work_lock(). So we don't have to worry about racing with the worker
threads.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 11:51:26 -07:00
Jeff King cea69151a4 index-pack: restore "resolving deltas" progress meter
Commit f08cbf60fe (index-pack: make quantum of work smaller, 2020-09-08)
refactored the main loop in threaded_second_pass(), but also deleted the
call to display_progress() at the top of the loop. This means that users
typically see no progress at all during the delta resolution phase (and
for large repositories, Git appears to hang).

This looks like an accident that was unrelated to the intended change of
that commit, since we continue to update nr_resolved_deltas in
resolve_delta(). Let's restore the call to get that progress back.

We'll also add a test that confirms we generate the expected progress.
This isn't perfect, as it wouldn't catch a bug where progress was
delayed to the end. That was probably possible to trigger when receiving
a thin pack, because we'd eventually call display_progress() from
fix_unresolved_deltas(), but only once after doing all the work.
However, since our test case generates a complete pack, it reliably
demonstrates this particular bug and its fix. And we can't do better
without making the test racy.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 11:50:09 -07:00
Denton Liu 5602b500c3 builtin/checkout: fix git checkout -p HEAD... bug
Running `git checkout -p` with a merge-base rev results in an error:

	$ git checkout -p HEAD...
	usage: git diff-index [-m] [--cached] [<common-diff-options>] <tree-ish> [<path>...]
	common diff options:
	  -z            output diff-raw with lines terminated with NUL.
	  -p            output patch format.
	  -u            synonym for -p.
	  --patch-with-raw
			output both a patch and the diff-raw format.
	  --stat        show diffstat instead of patch.
	  --numstat     show numeric diffstat instead of patch.
	  --patch-with-stat
			output a patch and prepend its diffstat.
	  --name-only   show only names of changed files.
	  --name-status show names and status of changed files.
	  --full-index  show full object name on index lines.
	  --abbrev=<n>  abbreviate object names in diff-tree header and diff-raw.
	  -R            swap input file pairs.
	  -B            detect complete rewrites.
	  -M            detect renames.
	  -C            detect copies.
	  --find-copies-harder
			try unchanged files as candidate for copy detection.
	  -l<n>         limit rename attempts up to <n> paths.
	  -O<file>      reorder diffs according to the <file>.
	  -S<string>    find filepair whose only one side contains the string.
	  --pickaxe-all
			show all files diff when -S is used and hit is found.
	  -a  --text    treat all files as text.

	Cannot close git diff-index --cached --numstat --summary HEAD... -- () at <redacted>/libexec/git-core/git-add--interactive line 183.

This happens because checkout passes the literal argument (in the
example, `HEAD...`) to diff-index which does not recognise merge-base
revs.

Fix this by using the hex of the found commit instead of the given name.
Note that "HEAD" is handled specially in run_add_interactive() so it's
explicitly not changed.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 09:49:05 -07:00
Junio C Hamano 5f8c70a148 Merge branch 'jk/format-auto-base-when-able'
"git format-patch" learns to take "whenAble" as a possible value
for the format.useAutoBase configuration variable to become no-op
when the  automatically computed base does not make sense.

* jk/format-auto-base-when-able:
  format-patch: teach format.useAutoBase "whenAble" option
2020-10-05 14:01:55 -07:00
Junio C Hamano 8e3ec76a20 Merge branch 'jk/refspecs-negative'
"git fetch" and "git push" support negative refspecs.

* jk/refspecs-negative:
  refspec: add support for negative refspecs
2020-10-05 14:01:54 -07:00
Junio C Hamano e68f0a4e57 Merge branch 'jt/keep-partial-clone-filter-upon-lazy-fetch'
The lazy fetching done internally to make missing objects available
in a partial clone incorrectly made permanent damage to the partial
clone filter in the repository, which has been corrected.

* jt/keep-partial-clone-filter-upon-lazy-fetch:
  fetch: do not override partial clone filter
  promisor-remote: remove unused variable
2020-10-05 14:01:53 -07:00
Junio C Hamano 19dd352d03 Merge branch 'jk/unused'
Code cleanup.

* jk/unused:
  dir.c: drop unused "untracked" from treat_path_fast()
  sequencer: handle ignore_footer when parsing trailers
  test-advise: check argument count with argc instead of argv
  sparse-checkout: fill in some options boilerplate
  sequencer: drop repository argument from run_git_commit()
  push: drop unused repo argument to do_push()
  assert PARSE_OPT_NONEG in parse-options callbacks
  env--helper: write to opt->value in parseopt helper
  drop unused argc parameters
  convert: drop unused crlf_action from check_global_conv_flags_eol()
2020-10-05 14:01:52 -07:00
Junio C Hamano 34415c76c8 Merge branch 'so/combine-diff-simplify'
Code simplification.

* so/combine-diff-simplify:
  diff: get rid of redundant 'dense' argument
2020-10-05 14:01:51 -07:00
Junio C Hamano 58138d3f26 Merge branch 'js/default-branch-name-part-2'
Update the tests to drop word 'master' from them.

* js/default-branch-name-part-2:
  t9902: avoid using the branch name `master`
  tests: avoid variations of the `master` branch name
  t3200: avoid variations of the `master` branch name
  fast-export: avoid using unnecessary language in a code comment
  t/test-terminal: avoid non-inclusive language
2020-10-05 14:01:50 -07:00
Junio C Hamano 2fa8aacc72 Merge branch 'jk/shortlog-group-by-trailer'
"git shortlog" has been taught to group commits by the contents of
the trailer lines, like "Reviewed-by:", "Coauthored-by:", etc.

* jk/shortlog-group-by-trailer:
  shortlog: allow multiple groups to be specified
  shortlog: parse trailer idents
  shortlog: rename parse_stdin_ident()
  shortlog: de-duplicate trailer values
  shortlog: match commit trailers with --group
  trailer: add interface for iterating over commit trailers
  shortlog: add grouping option
  shortlog: change "author" variables to "ident"
2020-10-04 12:49:14 -07:00
Junio C Hamano f4cc68cbd0 Merge branch 'mr/bisect-in-c-2'
Rewrite of the "git bisect" script in C continues.

* mr/bisect-in-c-2:
  bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C
  bisect: call 'clear_commit_marks_all()' in 'bisect_next_all()'
  bisect--helper: reimplement `bisect_autostart` shell function in C
  bisect--helper: introduce new `write_in_file()` function
  bisect--helper: use '-res' in 'cmd_bisect__helper' return
  bisect--helper: BUG() in cmd_*() on invalid subcommand
2020-10-04 12:49:08 -07:00
Junio C Hamano 03a01824a4 Merge branch 'cc/bisect-start-fix'
"git bisect start X Y", when X and Y are not valid committish
object names, should take X and Y as pathspec, but didn't.

* cc/bisect-start-fix:
  bisect: don't use invalid oid as rev when starting
2020-10-04 12:49:08 -07:00
Junio C Hamano 230ff3e997 Merge branch 'jc/blame-ignore-fix'
"git blame --ignore-rev/--ignore-revs-file" failed to validate
their input are valid revision, and failed to take into account
that the user may want to give an annotated tag instead of a
commit, which has been corrected.

* jc/blame-ignore-fix:
  blame: validate and peel the object names on the ignore list
  t8013: minimum preparatory clean-up
2020-10-04 12:49:07 -07:00
Junio C Hamano 86cca370e1 Merge branch 'jk/drop-unaligned-loads'
Compilation fix around type punning.

* jk/drop-unaligned-loads:
  Revert "fast-export: use local array to store anonymized oid"
  bswap.h: drop unaligned loads
2020-10-04 12:49:06 -07:00
Srinidhi Kaushik 3b990aa645 push: parse and set flag for "--force-if-includes"
The previous commit added the necessary machinery to implement the
"--force-if-includes" protection, when "--force-with-lease" is used
without giving exact object the remote still ought to have. Surface
the feature by adding a command line option and a configuration
variable to enable it.

 - Add a flag: "TRANSPORT_PUSH_FORCE_IF_INCLUDES" to indicate that the
   new option was passed from the command line of via configuration
   settings; update command line and configuration parsers to set the
   new flag accordingly.

 - Introduce a new configuration option "push.useForceIfIncludes", which
   is equivalent to setting "--force-if-includes" in the command line.

 - Update "remote-curl" to recognize and pass this option to "send-pack"
   when enabled.

 - Update "advise" to catch the reject reason "REJECT_REF_NEEDS_UPDATE",
   set when the ref status is "REF_STATUS_REJECT_REMOTE_UPDATED" and
   (optionally) print a help message when the push fails.

 - The new option is a "no-op" in the following scenarios:
    * When used without "--force-with-lease".
    * When used with "--force-with-lease", and if the expected commit
      on the remote side is specified as an argument.

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-03 09:59:19 -07:00
Srinidhi Kaushik 99a1f9ae10 push: add reflog check for "--force-if-includes"
Add a check to verify if the remote-tracking ref of the local branch
is reachable from one of its "reflog" entries.

The check iterates through the local ref's reflog to see if there
is an entry for the remote-tracking ref and collecting any commits
that are seen, into a list; the iteration stops if an entry in the
reflog matches the remote ref or if the entry timestamp is older
the latest entry of the remote ref's "reflog". If there wasn't an
entry found for the remote ref, "in_merge_bases_many()" is called
to check if it is reachable from the list of collected commits.

When a local branch that is based on a remote ref, has been rewound
and is to be force pushed on the remote, "--force-if-includes" runs
a check that ensures any updates to the remote-tracking ref that may
have happened (by push from another repository) in-between the time
of the last update to the local branch (via "git-pull", for instance)
and right before the time of push, have been integrated locally
before allowing a forced update.

If the new option is passed without specifying "--force-with-lease",
or specified along with "--force-with-lease=<refname>:<expect>" it
is a "no-op".

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-03 09:59:18 -07:00
Jacob Keller 7efba5fa39 format-patch: teach format.useAutoBase "whenAble" option
The format.useAutoBase configuration option exists to allow users to
enable '--base=auto' for format-patch by default.

This can sometimes lead to poor workflow, due to unexpected failures
when attempting to format an ancient patch:

    $ git format-patch -1 <an old commit>
    fatal: base commit shouldn't be in revision list

This can be very confusing, as it is not necessarily immediately obvious
that the user requested a --base (since this was in the configuration,
not on the command line).

We do want --base=auto to fail when it cannot provide a suitable base,
as it would be equally confusing if a formatted patch did not include
the base information when it was requested.

Teach format.useAutoBase a new mode, "whenAble". This mode will cause
format-patch to attempt to include a base commit when it can. However,
if no valid base commit can be found, then format-patch will continue
formatting the patch without a base commit.

In order to avoid making yet another branch name unusable with --base,
do not teach --base=whenAble or --base=whenable.

Instead, refactor the base_commit option to use a callback, and rely on
the global configuration variable auto_base.

This does mean that a user cannot request this optional base commit
generation from the command line. However, this is likely not too
valuable. If the user requests base information manually, they will be
immediately informed of the failure to acquire a suitable base commit.
This allows the user to make an informed choice about whether to
continue the format.

Add tests to cover the new mode of operation for --base.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-01 15:22:10 -07:00
Sean Barag de9ed3ef37 clone: allow configurable default for -o/--origin
While the default remote name of "origin" can be changed at clone-time
with `git clone`'s `--origin` option, it was previously not possible
to specify a default value for the name of that remote.  Add support for
a new `clone.defaultRemoteName` config, with the newly-created remote
name resolved in priority order:

1. (Highest priority) A remote name passed directly to `git clone -o`
2. A `clone.defaultRemoteName=new_name` in config `git clone -c`
3. A `clone.defaultRemoteName` value set in `/path/to/template/config`,
   where `--template=/path/to/template` is provided
4. A `clone.defaultRemoteName` value set in a non-template config file
5. The default value of `origin`

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 22:09:13 -07:00
Sean Barag 75ca3906b1 clone: read new remote name from remote_name instead of option_origin
In a future patch, the name of the remote created by `git clone` may
come from multiple sources.  To avoid confusion, convert most uses of
option_origin to remote_name, leaving option_origin to exclusively
represent the -o/--origin option.

Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 22:09:13 -07:00
Sean Barag ebe7e28a36 clone: validate --origin option before use
Providing a bad origin name to `git clone` currently reports an
'invalid refspec' error instead of a more explicit message explaining
that the `--origin` option was malformed.  This behavior dates back to
since 8434c2f1 (Build in clone, 2008-04-27).  Reintroduce
validation for the provided `--origin` option, but notably _don't_
include a multi-level check (e.g. "foo/bar") that was present in the
original `git-clone.sh`.  `git remote` allows multi-level remote names
since at least 46220ca100 (remote.c: Fix overtight refspec validation,
2008-03-20), so that appears to be the desired behavior.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 22:09:13 -07:00
Sean Barag f2c6fda886 refs: consolidate remote name validation
In preparation for a future patch, extract from remote.c a function that
validates possible remote names so that its rules can be used
consistently in other places.

Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 22:09:13 -07:00
Sean Barag 552955ed7f clone: use more conventional config/option layering
Parsing command-line options before reading from config required careful
handling to ensure CLI options were treated with higher priority.  Read
config first to let parsed CLI naively overwrite matching config values.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 22:09:13 -07:00
Jacob Keller c0192df630 refspec: add support for negative refspecs
Both fetch and push support pattern refspecs which allow fetching or
pushing references that match a specific pattern. Because these patterns
are globs, they have somewhat limited ability to express more complex
situations.

For example, suppose you wish to fetch all branches from a remote except
for a specific one. To allow this, you must setup a set of refspecs
which match only the branches you want. Because refspecs are either
explicit name matches, or simple globs, many patterns cannot be
expressed.

Add support for a new type of refspec, referred to as "negative"
refspecs. These are prefixed with a '^' and mean "exclude any ref
matching this refspec". They can only have one "side" which always
refers to the source. During a fetch, this refers to the name of the ref
on the remote. During a push, this refers to the name of the ref on the
local side.

With negative refspecs, users can express more complex patterns. For
example:

 git fetch origin refs/heads/*:refs/remotes/origin/* ^refs/heads/dontwant

will fetch all branches on origin into remotes/origin, but will exclude
fetching the branch named dontwant.

Refspecs today are commutative, meaning that order doesn't expressly
matter. Rather than forcing an implied order, negative refspecs will
always be applied last. That is, in order to match, a ref must match at
least one positive refspec, and match none of the negative refspecs.
This is similar to how negative pathspecs work.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 14:52:00 -07:00
Jeff King 75d3bee157 sparse-checkout: fill in some options boilerplate
The sparse-checkout passes along argv and argc to its sub-command helper
functions. Many of these sub-commands do not yet take any command-line
options, and ignore those parameters.

Let's instead add empty option lists and make sure we call
parse_options(). That will give a useful error message for something
like:

  git sparse-checkout list --nonsense

which currently just silently ignores the unknown option.

As a bonus, it also silences some -Wunused-parameter warnings.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 12:53:48 -07:00
Jeff King 5b9427e0ac push: drop unused repo argument to do_push()
We stopped using the "repo" argument in 8e4c8af058 (push: disallow --all
and refspecs when remote.<name>.mirror is set, 2019-09-02), which moved
the pushremote handling to its caller.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 12:53:47 -07:00
Jeff King 8d2aa8dfac assert PARSE_OPT_NONEG in parse-options callbacks
In the spirit of 517fe807d6 (assert NOARG/NONEG behavior of
parse-options callbacks, 2018-11-05), let's cover some parse-options
callbacks which expect to be used with PARSE_OPT_NONEG but don't
explicitly assert that this is the case. These callbacks are all used
correctly in the current code, but this will help document their
expectations and future-proof the code.

As a bonus, it also silences -Wunused-parameters (these were added since
the initial sweep of 517fe807d6, and we can't yet turn on
-Wunused-parameters to remind people because it has too many existing
false positives).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 12:53:47 -07:00
Jeff King 424e28fcad env--helper: write to opt->value in parseopt helper
We use OPT_CALLBACK_F() to call the option_parse_type() callback,
passing it the address of "cmdmode" as the value to write to. But the
callback doesn't look at opt->value at all, and instead writes to a
global variable.

This works out because that's the same global variable we happen to pass
in, but it's rather confusing.  Let's use the passed-in value instead.
We'll also make "cmdmode" a local variable of the main function,
ensuring we can't make the same mistake again.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 12:53:47 -07:00
Jeff King e885a84f1b drop unused argc parameters
Many functions take an argv/argc pair, but never actually look at argc.
This makes it useless at best (we use the NULL sentinel in argv to find
the end of the array), and misleading at worst (what happens if the argc
count does not match the argv NULL?).

In each of these instances, the argv NULL does match the argc count, so
there are no bugs here. But let's tighten the interfaces to make it
harder to get wrong (and to reduce some -Wunused-parameter complaints).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 12:53:47 -07:00
Junio C Hamano 299deeac8a Merge branch 'ah/pull'
Earlier we taught "git pull" to warn when the user does not say the
histories need to be merged, rebased or accepts only fast-
forwarding, but the warning triggered for those who have set the
pull.ff configuration variable.

* ah/pull:
  pull: don't warn if pull.ff has been set
2020-09-29 14:01:22 -07:00
Junio C Hamano b28919c7bc Merge branch 'bc/clone-with-git-default-hash-fix'
"git clone" that clones from SHA-1 repository, while
GIT_DEFAULT_HASH set to use SHA-256 already, resulted in an
unusable repository that half-claims to be SHA-256 repository
with SHA-1 objects and refs.  This has been corrected.

* bc/clone-with-git-default-hash-fix:
  builtin/clone: avoid failure with GIT_DEFAULT_HASH
2020-09-29 14:01:20 -07:00
Junio C Hamano 288ed98bf7 Merge branch 'tb/bloom-improvements'
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.

* tb/bloom-improvements:
  commit-graph: introduce 'commitGraph.maxNewFilters'
  builtin/commit-graph.c: introduce '--max-new-filters=<n>'
  commit-graph: rename 'split_commit_graph_opts'
  bloom: encode out-of-bounds filters as non-empty
  bloom/diff: properly short-circuit on max_changes
  bloom: use provided 'struct bloom_filter_settings'
  bloom: split 'get_bloom_filter()' in two
  commit-graph.c: store maximum changed paths
  commit-graph: respect 'commitGraph.readChangedPaths'
  t/helper/test-read-graph.c: prepare repo settings
  commit-graph: pass a 'struct repository *' in more places
  t4216: use an '&&'-chain
  commit-graph: introduce 'get_bloom_filter_settings()'
2020-09-29 14:01:20 -07:00
Sergey Organov d01141de5a diff: get rid of redundant 'dense' argument
Get rid of 'dense' argument that is redundant for every function that has
'struct rev_info *rev' argument as well, as the value of 'dense' passed is
always taken from 'rev->dense_combined_merges' field.

The only place where this was not the case is in 'submodule.c' where
'diff_tree_combined_merge()' was called with '1' for 'dense' argument. However,
at that call the 'revs' instance used is local to the function, and we now just
set 'revs->dense_combined_merges' to 1 in this local instance.

Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-29 11:54:53 -07:00
Jonathan Tan 23547c4051 fetch: do not override partial clone filter
When a fetch with the --filter argument is made, the configured default
filter is set even if one already exists. This change was made in
5e46139376 ("builtin/fetch: remove unique promisor remote limitation",
2019-06-25) - in particular, changing from:

 * If this is the FIRST partial-fetch request, we enable partial
 * on this repo and remember the given filter-spec as the default
 * for subsequent fetches to this remote.

to:

 * If this is a partial-fetch request, we enable partial on
 * this repo if not already enabled and remember the given
 * filter-spec as the default for subsequent fetches to this
 * remote.

(The given filter-spec is "remembered" even if there is already an
existing one.)

This is problematic whenever a lazy fetch is made, because lazy fetches
are made using "git fetch --filter=blob:none", but this will also happen
if the user invokes "git fetch --filter=<filter>" manually. Therefore,
restore the behavior prior to 5e46139376, which writes a filter-spec
only if the current fetch request is the first partial-fetch one (for
that remote).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-28 16:11:59 -07:00
Jeff King 63d24fa0b0 shortlog: allow multiple groups to be specified
Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.

This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:

  git shortlog -ns --group=author --group=trailer:co-authored-by

to get a shortlog that counts authors and co-authors equally.

The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.

The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).

There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Jeff King 56d5dde752 shortlog: parse trailer idents
Trailers don't necessarily contain name/email identity values, so
shortlog has so far treated them as opaque strings. However, since many
trailers do contain identities, it's useful to treat them as such when
they can be parsed. That lets "-e" work as usual, as well as mailmap.

When they can't be parsed, we'll continue with the old behavior of
treating them as a single string (there's no new test for that here,
since the existing tests cover a trailer like this).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Jeff King 87abb96222 shortlog: rename parse_stdin_ident()
This function is actually useful for parsing any identity, whether from
stdin or not. We'll need it for handling trailers.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Jeff King f17b0b99bf shortlog: de-duplicate trailer values
The current documentation is vague about what happens with
--group=trailer:signed-off-by when we see a commit with:

  Signed-off-by: One
  Signed-off-by: Two
  Signed-off-by: One

We clearly should credit both "One" and "Two", but should "One" get
credited twice? The current code does so, but mostly because that was
the easiest thing to do. It's probably more useful to count each commit
at most once. This will become especially important when we allow
values from multiple sources in a future patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Jeff King 47beb37bc6 shortlog: match commit trailers with --group
If a project uses commit trailers, this patch lets you use
shortlog to see who is performing each action. For example,
running:

  git shortlog -ns --group=trailer:reviewed-by

in git.git shows who has reviewed. You can even use a custom
format to see things like who has helped whom:

  git shortlog --format="...helped %an (%ad)" \
               --group=trailer:helped-by

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Jeff King 92338c450b shortlog: add grouping option
In preparation for adding more grouping types, let's refactor the
committer/author grouping code and add a user-facing option that binds
them together. In particular:

  - the main option is now "--group", to make it clear
    that the various group types are mutually exclusive. The
    "--committer" option is an alias for "--group=committer".

  - we keep an enum rather than a binary flag, to prepare
    for more values

  - we prefer switch statements to ternary assignment, since
    other group types will need more custom code

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-27 12:21:05 -07:00
Junio C Hamano 6c430a647c Merge branch 'jx/proc-receive-hook'
"git receive-pack" that accepts requests by "git push" learned to
outsource most of the ref updates to the new "proc-receive" hook.

* jx/proc-receive-hook:
  doc: add documentation for the proc-receive hook
  transport: parse report options for tracking refs
  t5411: test updates of remote-tracking branches
  receive-pack: new config receive.procReceiveRefs
  doc: add document for capability report-status-v2
  New capability "report-status-v2" for git-push
  receive-pack: feed report options to post-receive
  receive-pack: add new proc-receive hook
  t5411: add basic test cases for proc-receive hook
  transport: not report a non-head push as a branch
2020-09-25 15:25:39 -07:00
Junio C Hamano 48794acc50 Merge branch 'ds/maintenance-part-1'
A "git gc"'s big brother has been introduced to take care of more
repository maintenance tasks, not limited to the object database
cleaning.

* ds/maintenance-part-1:
  maintenance: add trace2 regions for task execution
  maintenance: add auto condition for commit-graph task
  maintenance: use pointers to check --auto
  maintenance: create maintenance.<task>.enabled config
  maintenance: take a lock on the objects directory
  maintenance: add --task option
  maintenance: add commit-graph task
  maintenance: initialize task array
  maintenance: replace run_auto_gc()
  maintenance: add --quiet option
  maintenance: create basic maintenance runner
2020-09-25 15:25:38 -07:00
Derrick Stolee 2fec604f8d maintenance: add start/stop subcommands
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.

The schedule is laid out as follows:

  0 1-23 * * *   $cmd maintenance run --schedule=hourly
  0 0    * * 1-6 $cmd maintenance run --schedule=daily
  0 0    * * 0   $cmd maintenance run --schedule=weekly

where $cmd is a properly-qualified 'git for-each-repo' execution:

$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo

where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.

This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.

The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee 0c18b70081 maintenance: add [un]register subcommands
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.

These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.

For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.

The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee 4950b2a2b5 for-each-repo: run subcommands on configured repos
It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.

The test is very simple, but does highlight that the "--" argument is
optional.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee b08ff1fee0 maintenance: add --schedule option and config
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.

The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:

	'hourly' > 'daily' > 'weekly'

Use new 'enum schedule_priority' to track these values numerically.

The following cron table would run the scheduled tasks with the correct
frequencies:

  0 1-23 * * *    git -C <repo> maintenance run --schedule=hourly
  0 0    * * 1-6  git -C <repo> maintenance run --schedule=daily
  0 0    * * 0    git -C <repo> maintenance run --schedule=weekly

This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:59:44 -07:00
Derrick Stolee e841a79a13 maintenance: add incremental-repack auto condition
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.

The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.

Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:05 -07:00
Derrick Stolee a13e3d0ec8 maintenance: auto-size incremental-repack batch
When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.

Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:05 -07:00
Derrick Stolee 52fe41ff1c maintenance: add incremental-repack task
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.

One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.

Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).

The 'incremental-repack' task runs the following steps:

1. 'git multi-pack-index write' creates a multi-pack-index file if
   one did not exist, and otherwise will update the multi-pack-index
   with any new pack-files that appeared since the last write. This
   is particularly relevant with the background fetch job.

   When the multi-pack-index sees two copies of the same object, it
   stores the offset data into the newer pack-file. This means that
   some old pack-files could become "unreferenced" which I will use
   to mean "a pack-file that is in the pack-file list of the
   multi-pack-index but none of the objects in the multi-pack-index
   reference a location inside that pack-file."

2. 'git multi-pack-index expire' deletes any unreferenced pack-files
   and updaes the multi-pack-index to drop those pack-files from the
   list. This is safe to do as concurrent Git processes will see the
   multi-pack-index and not open those packs when looking for object
   contents. (Similar to the 'loose-objects' job, there are some Git
   commands that open pack-files regardless of the multi-pack-index,
   but they are rarely used. Further, a user that self-selects to
   use background operations would likely refrain from using those
   commands.)

3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
   of pack-files that are listed in the multi-pack-index and creates
   a new pack-file containing the objects whose offsets are listed
   by the multi-pack-index to be in those objects. The set of pack-
   files is selected greedily by sorting the pack-files by modified
   time and adding a pack-file to the set if its "expected size" is
   smaller than the batch size until the total expected size of the
   selected pack-files is at least the batch size. The "expected
   size" is calculated by taking the size of the pack-file divided
   by the number of objects in the pack-file and multiplied by the
   number of objects from the multi-pack-index with offset in that
   pack-file. The expected size approximates how much data from that
   pack-file will contribute to the resulting pack-file size. The
   intention is that the resulting pack-file will be close in size
   to the provided batch size.

   The next run of the incremental-repack task will delete these
   repacked pack-files during the 'expire' step.

   In this version, the batch size is set to "0" which ignores the
   size restrictions when selecting the pack-files. It instead
   selects all pack-files and repacks all packed objects into a
   single pack-file. This will be updated in the next change, but
   it requires doing some calculations that are better isolated to
   a separate change.

These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).

These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.

By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 3e220e6069 maintenance: create auto condition for loose-objects
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.

The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 252cfb7cb8 maintenance: add loose-objects task
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.

Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:

1. Run 'git prune-packed' to delete any loose objects that exist
   in a pack-file. Concurrent commands will prefer the packed
   version of the object to the loose version. (Of course, there
   are exceptions for commands that specifically care about the
   location of an object. These are rare for a user to run on
   purpose, and we hope a user that has selected background
   maintenance will not be trying to do foreground maintenance.)

2. Run 'git pack-objects' on a batch of loose objects. These
   objects are grouped by scanning the loose object directories in
   lexicographic order until listing all loose objects -or-
   reaching 50,000 objects. This is more than enough if the loose
   objects are created only by a user doing normal development.
   We noticed users with _millions_ of loose objects because VFS
   for Git downloads blobs on-demand when a file read operation
   requires populating a virtual file.

This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee 28cb5e66dd maintenance: add prefetch task
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.

Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.

The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.

However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.

When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:

1. --no-tags prevents getting a new tag when a user wants to see
   the new tags appear in their foreground fetches.

2. --refmap= removes the configured refspec which usually updates
   refs/remotes/<remote>/* with the refs advertised by the remote.
   While this looks confusing, this was documented and tested by
   b40a50264a (fetch: document and test --refmap="", 2020-01-21),
   including this sentence in the documentation:

	Providing an empty `<refspec>` to the `--refmap` option
	causes Git to ignore the configured refspecs and rely
	entirely on the refspecs supplied as command-line arguments.

3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
   we can ensure that we actually load the new values somewhere in
   our refspace while not updating refs/heads or refs/remotes. By
   storing these refs here, the commit-graph job will update the
   commit-graph with the commits from these hidden refs.

4. --prune will delete the refs/prefetch/<remote> refs that no
   longer appear on the remote.

5. --no-write-fetch-head prevents updating FETCH_HEAD.

We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Jeff King 45d93eb824 shortlog: change "author" variables to "ident"
We already match "committer", and we're about to start
matching more things. Let's use a more neutral variable to
avoid confusion.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:47:50 -07:00
Christian Couder 73c6de06af bisect: don't use invalid oid as rev when starting
In 06f5608c14 (bisect--helper: `bisect_start` shell function
partially in C, 2019-01-02), we changed the following shell
code:

-      rev=$(git rev-parse -q --verify "$arg^{commit}") || {
-              test $has_double_dash -eq 1 &&
-              die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
-              break
-      }
-      revs="$revs $rev"

into:

+      char *commit_id = xstrfmt("%s^{commit}", arg);
+      if (get_oid(commit_id, &oid) && has_double_dash)
+              die(_("'%s' does not appear to be a valid "
+                    "revision"), arg);
+
+      string_list_append(&revs, oid_to_hex(&oid));
+      free(commit_id);

In case of an invalid "arg" when "has_double_dash" is false, the old
code would "break" out of the argument loop.

In the new C code though, `oid_to_hex(&oid)` is unconditonally
appended to "revs". This is wrong first because "oid" is junk as
`get_oid(commit_id, &oid)` failed and second because it doesn't break
out of the argument loop.

Not breaking out of the argument loop means that "arg" is then not
treated as a path restriction (which is wrong).

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 09:57:48 -07:00
Alex Henrie 54200cef86 pull: don't warn if pull.ff has been set
A user who understands enough to set pull.ff does not need additional
instructions.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 23:04:27 -07:00
Junio C Hamano 610e2b9240 blame: validate and peel the object names on the ignore list
The command reads list of object names to place on the ignore list
either from the command line or from a file, but they are not
checked with their object type (those read from the file are not
even checked for object existence).

Extend the oidset_parse_file() API and allow it to take a callback
that can be used to die (e.g. when an inappropriate input is read)
or modify the object name read (e.g. when a tag pointing at a commit
is read, and the caller wants a commit object name), and use it in
the code that handles ignore list.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 22:20:58 -07:00
Jeff King 176380fd11 Revert "fast-export: use local array to store anonymized oid"
This reverts commit f39ad38410.

That commit was trying to silence a type-punning warning on older
versions of gcc. However, its analysis was all wrong. I didn't notice
that we _were_ in fact type-punning because there are two versions of
put_be32(): one that uses casts and unaligned loads, and another that
uses bitshifts. I looked at the latter, but on my platform we were
defaulting to the former.

However, as of the previous commit, we'll always use the bitshift
version. So we can drop this hackery to avoid the warning, making the
code slightly cleaner.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:30:11 -07:00
Pranit Bauva 517ecb3161 bisect--helper: reimplement bisect_next and bisect_auto_next shell functions in C
Reimplement the `bisect_next()` and the `bisect_auto_next()` shell functions
in C and add the subcommands to `git bisect--helper` to call them from
git-bisect.sh .

bisect_auto_next() function returns an enum bisect_error type as whole
`git bisect` can exit with an error code when bisect_next() does.

Return an error when `bisect_next()` fails, that fix a bug on shell script
version.

Using `--bisect-next` and `--bisect-auto-next` subcommands is a
temporary measure to port shell function to C so as to use the existing
test suite. As more functions are ported, `--bisect-auto-next`
subcommand will be retired and will be called by some other methods.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:06:30 -07:00
Pranit Bauva 09535f056b bisect--helper: reimplement bisect_autostart shell function in C
Reimplement the `bisect_autostart()` shell function in C and add the
C implementation from `bisect_next()` which was previously left
uncovered.

Add `--bisect-autostart` subcommand to be called from git-bisect.sh.
Using `--bisect-autostart` subcommand is a temporary measure to port
the shell function to C so as to use the existing test suite. As more
functions are ported, this subcommand will be retired and
bisect_autostart() will be called directly by `bisect_state()`.

Change behavior of shell script that returned success when user aborted
the bisection.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:06:30 -07:00
Junio C Hamano 6854689e65 Merge branch 'ar/fetch-ipversion-in-all'
"git fetch --all --ipv4/--ipv6" forgot to pass the protocol options
to instances of the "git fetch" that talk to individual remotes,
which has been corrected.

* ar/fetch-ipversion-in-all:
  fetch: pass --ipv4 and --ipv6 options to sub-fetches
2020-09-22 12:36:34 -07:00
Junio C Hamano 39149df364 Merge branch 'cs/don-t-pretend-a-failed-remote-set-head-succeeded'
"git remote set-head" that failed still said something that hints
the operation went through, which was misleading.

* cs/don-t-pretend-a-failed-remote-set-head-succeeded:
  remote: don't show success message when set-head fails
2020-09-22 12:36:32 -07:00
Junio C Hamano 26a3728bed Merge branch 'al/ref-filter-merged-and-no-merged'
"git for-each-ref" and friends that list refs used to allow only
one --merged or --no-merged to filter them; they learned to take
combination of both kind of filtering.

* al/ref-filter-merged-and-no-merged:
  Doc: prefer more specific file name
  ref-filter: make internal reachable-filter API more precise
  ref-filter: allow merged and no-merged filters
  Doc: cover multiple contains/no-contains filters
  t3201: test multiple branch filter combinations
2020-09-22 12:36:31 -07:00
Junio C Hamano 4aff18a3f0 Merge branch 'ls/mergetool-meld-auto-merge'
The 'meld' backend of the "git mergetool" learned to give the
underlying 'meld' the '--auto-merge' option, which would help
reduce the amount of text that requires manual merging.

* ls/mergetool-meld-auto-merge:
  mergetool: allow auto-merge for meld to follow the vim-diff behavior
2020-09-22 12:36:29 -07:00
Junio C Hamano b7e65b51e5 Merge branch 'jt/threaded-index-pack'
"git index-pack" learned to resolve deltified objects with greater
parallelism.

* jt/threaded-index-pack:
  index-pack: make quantum of work smaller
  index-pack: make resolve_delta() assume base data
  index-pack: calculate {ref,ofs}_{first,last} early
  index-pack: remove redundant child field
  index-pack: unify threaded and unthreaded code
  index-pack: remove redundant parameter
  Documentation: deltaBaseCacheLimit is per-thread
2020-09-22 12:36:28 -07:00
Junio C Hamano 634e0084fa Merge branch 'es/format-patch-interdiff-cleanup'
"format-patch --range-diff=<prev> <origin>..HEAD" has been taught
not to ignore <origin> when <prev> is a single version.

* es/format-patch-interdiff-cleanup:
  format-patch: use 'origin' as start of current-series-range when known
  diff-lib: tighten show_interdiff()'s interface
  diff: move show_interdiff() from its own file to diff-lib
2020-09-22 12:36:28 -07:00
Junio C Hamano bcb68bff80 Merge branch 'os/fetch-submodule-optim'
Optimization around submodule handling.

* os/fetch-submodule-optim:
  fetch: do not look for submodule changes in unchanged refs
2020-09-22 12:36:28 -07:00
brian m. carlson 47ac970309 builtin/clone: avoid failure with GIT_DEFAULT_HASH
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256".  This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).

This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using.  We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.

We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be.  And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.

Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1.  This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.

Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-22 09:22:32 -07:00
Johannes Schindelin 5a0c32bd4b fast-export: avoid using unnecessary language in a code comment
In an ongoing effort to avoid non-inclusive language, let's avoid using
the branch name "master" in a code comment.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:19:27 -07:00
Denton Liu 3d09c22869 builtin/diff-tree: learn --merge-base
The previous commit introduced ---merge-base a way to take the diff
between the working tree or index and the merge base between an arbitrary
commit and HEAD. It makes sense to extend this option to support the
case where two commits are given too and behave in a manner identical to
`git diff A...B`.

Introduce the --merge-base flag as an alternative to triple-dot
notation. Thus, we would be able to write the above as
`git diff --merge-base A B`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 13:37:03 -07:00
Denton Liu 0f5a1d449b builtin/diff-index: learn --merge-base
There is currently no easy way to take the diff between the working tree
or index and the merge base between an arbitrary commit and HEAD. Even
diff's `...` notation doesn't allow this because it only works between
commits. However, the ability to do this would be desirable to a user
who would like to see all the changes they've made on a branch plus
uncommitted changes without taking into account changes made in the
upstream branch.

Teach diff-index and diff (with one commit) the --merge-base option
which allows a user to use the merge base of a commit and HEAD as the
"before" side.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-20 21:30:26 -07:00
Denton Liu 4c3fe82ef1 diff-lib: accept option flags in run_diff_index()
In a future commit, we will teach run_diff_index() to accept more
options via flag bits. For now, change `cached` into a flag in the
`option` bitfield. The behaviour should remain exactly the same.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-20 21:30:26 -07:00
Junio C Hamano 80cacaec41 Merge branch 'mt/config-fail-nongit-early'
Unlike "git config --local", "git config --worktree" did not fail
early and cleanly when started outside a git repository.

* mt/config-fail-nongit-early:
  config: complain about --worktree outside of a git repo
2020-09-18 17:58:06 -07:00
Junio C Hamano 9d4e7ec4d9 Merge branch 'jc/quote-path-cleanup'
"git status --short" quoted a path with SP in it when tracked, but
not those that are untracked, ignored or unmerged.  They are all
shown quoted consistently.

* jc/quote-path-cleanup:
  quote: turn 'nodq' parameter into a set of flags
  quote: rename misnamed sq_lookup[] to cq_lookup[]
  wt-status: consistently quote paths in "status --short" output
  quote_path: code clarification
  quote_path: optionally allow quoting a path with SP in it
  quote_path: give flags parameter to quote_path()
  quote_path: rename quote_path_relative() to quote_path()
2020-09-18 17:58:04 -07:00
Junio C Hamano 45f462b5c5 Merge branch 'es/wt-add-detach'
"git worktree add" learns that the "-d" is a synonym to "--detach"
option to create a new worktree without being on a branch.

* es/wt-add-detach:
  git-worktree.txt: discuss branch-based vs. throwaway worktrees
  worktree: teach `add` to recognize -d as shorthand for --detach
  git-checkout.txt: document -d short option for --detach
2020-09-18 17:58:04 -07:00
Junio C Hamano e96b271d18 Merge branch 'jc/add-i-use-builtin-experimental'
The "add -i/-p" machinery has been written in C but it is not used
by default yet.  It is made default to those who are participating
in feature.experimental experiment.

* jc/add-i-use-builtin-experimental:
  add -i: use the built-in version when feature.experimental is set
2020-09-18 17:58:02 -07:00
Junio C Hamano 21de7e9c50 Merge branch 'rs/refspec-leakfix'
Leakfix.

* rs/refspec-leakfix:
  refspec: add and use refspec_appendf()
  push: release strbufs used for refspec formatting
2020-09-18 17:58:00 -07:00
Junio C Hamano 9b8074427b Merge branch 'rs/misc-cleanups'
Misc cleanups.

* rs/misc-cleanups:
  pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
  midx: use hashwrite_u8() in write_midx_header()
  fast-import: use write_pack_header()
2020-09-18 17:58:00 -07:00
Taylor Blau d356d5debe commit-graph: introduce 'commitGraph.maxNewFilters'
Introduce a configuration variable to specify a default value for the
recently-introduce '--max-new-filters' option of 'git commit-graph
write'.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-18 10:39:22 -07:00
Taylor Blau 809e0327f5 builtin/commit-graph.c: introduce '--max-new-filters=<n>'
Introduce a command-line flag to specify the maximum number of new Bloom
filters that a 'git commit-graph write' is willing to compute from
scratch.

Prior to this patch, a commit-graph write with '--changed-paths' would
compute Bloom filters for all selected commits which haven't already
been computed (i.e., by a previous commit-graph write with '--split'
such that a roll-up or replacement is performed).

This behavior can cause prohibitively-long commit-graph writes for a
variety of reasons:

  * There may be lots of filters whose diffs take a long time to
    generate (for example, they have close to the maximum number of
    changes, diffing itself takes a long time, etc).

  * Old-style commit-graphs (which encode filters with too many entries
    as not having been computed at all) cause us to waste time
    recomputing filters that appear to have not been computed only to
    discover that they are too-large.

This can make the upper-bound of the time it takes for 'git commit-graph
write --changed-paths' to be rather unpredictable.

To make this command behave more predictably, introduce
'--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters
from scratch. This lets "computing" already-known filters proceed
quickly, while bounding the number of slow tasks that Git is willing to
do.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-18 10:35:39 -07:00
Taylor Blau 98bb796191 commit-graph: rename 'split_commit_graph_opts'
In the subsequent commit, additional options will be added to the
commit-graph API which have nothing to do with splitting.

Rename the 'split_commit_graph_opts' structure to the more-generic
'commit_graph_opts' to encompass both. Likewise, rename the 'flags'
member to instead be 'split_flags' to clarify that it only has to do
with the behavior implied by '--split'.

Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 21:55:50 -07:00
Christian Schlack 5a07c6c3c2 remote: don't show success message when set-head fails
Suppress the message 'origin/HEAD set to master' in case of an error.

  $ git remote set-head origin -a
  error: Not a valid ref: refs/remotes/origin/master
  origin/HEAD set to master

Signed-off-by: Christian Schlack <christian@backhub.co>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:40:17 -07:00
Derrick Stolee 25914c4fde maintenance: add trace2 regions for task execution
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 4ddc79b2da maintenance: add auto condition for commit-graph task
Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.

This count is controlled by the maintenance.commit-graph.auto config
option.

To compute the count, use a depth-first search starting at each ref, and
leaving markers using the SEEN flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 916d0626c2 maintenance: use pointers to check --auto
The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.

Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.

First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.

Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.

Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.

We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 65d655b52d maintenance: create maintenance.<task>.enabled config
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.

Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee d7514f6ed5 maintenance: take a lock on the objects directory
Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.

If the lock file already exists, then no maintenance tasks are
attempted. This will become very important later when we implement the
'prefetch' task, as this is our stop-gap from creating a recursive process
loop between 'git fetch' and 'git maintenance run --auto'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 090511bc0b maintenance: add --task option
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.

Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.

Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 663b2b1b90 maintenance: add commit-graph task
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command

	git commit-graph write --reachable --split

By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.

Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)

If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 3103e9848f maintenance: initialize task array
In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.

The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.

A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.

The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will print an error message.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee a95ce12430 maintenance: replace run_auto_gc()
The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.

To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.

Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.

Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 3ddaad0e06 maintenance: add --quiet option
Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.

Pipe the option to the 'git gc' child process.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:05 -07:00
Derrick Stolee 2057d75038 maintenance: create basic maintenance runner
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.

Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.

For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.

Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.

Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.

Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:30:04 -07:00
Lin Sun dbd8c09bfe mergetool: allow auto-merge for meld to follow the vim-diff behavior
Make the mergetool used with "meld" backend behave similarly to "vimdiff" by
telling it to auto-merge non-conflicting parts and highlight the conflicting
parts when `mergetool.meld.useAutoMerge` is configured with `true`, or `auto`
for detecting the `--auto-merge` option automatically.

Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Helped-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Lin Sun <lin.sun@zoom.us>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-16 17:11:20 -07:00
Aaron Lipman 21bf933928 ref-filter: allow merged and no-merged filters
Enable ref-filter to process multiple merged and no-merged filters, and
extend functionality to git branch, git tag and git for-each-ref. This
provides an easy way to check for branches that are "graduation
candidates:"

$ git branch --no-merged master --merged next

If passed more than one merged (or more than one no-merged) filter, refs
must be reachable from any one of the merged commits, and reachable from
none of the no-merged commits.

Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-16 12:38:10 -07:00
Alex Riesen 4e735c1326 fetch: pass --ipv4 and --ipv6 options to sub-fetches
The options indicate user intent for the whole fetch operation, and
ignoring them in sub-fetches (i.e. "--all" and recursive fetching of
submodules) is quite unexpected when, for instance, it is intended
to limit all of the communication to a specific transport protocol
for some reason.

Signed-off-by: Alex Riesen <alexander.riesen@cetitec.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-15 14:15:05 -07:00
Junio C Hamano 88910c9939 quote_path: give flags parameter to quote_path()
The quote_path() function computes a path (relative to its base
directory) and c-quotes the result if necessary.  Teach it to take a
flags parameter to allow its behaviour to be enriched later.

No behaviour change intended.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-10 10:49:19 -07:00
Junio C Hamano c34d24b8a4 quote_path: rename quote_path_relative() to quote_path()
There is no quote_path_absolute() or anything that causes confusion,
and one of the two large consumers already rename the long name
locally with a preprocessor macro.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-10 10:49:17 -07:00
Junio C Hamano 0df670bc0b Merge branch 'jt/interpret-branch-name-fallback'
"git status" has trouble showing where it came from by interpreting
reflog entries that recordcertain events, e.g. "checkout @{u}", and
gives a hard/fatal error.  Even though it inherently is impossible
to give a correct answer because the reflog entries lose some
information (e.g. "@{u}" does not record what branch the user was
on hence which branch 'the upstream' needs to be computed, and even
if the record were available, the relationship between branches may
have changed), at least hide the error to allow "status" show its
output.

* jt/interpret-branch-name-fallback:
  wt-status: tolerate dangling marks
  refs: move dwim_ref() to header file
  sha1-name: replace unsigned int with option struct
2020-09-09 13:53:09 -07:00
Junio C Hamano 9f7833fd55 Merge branch 'ss/submodule-summary-in-c-fixes'
Fixups to a topic in 'next'.

* ss/submodule-summary-in-c-fixes:
  t7421: eliminate 'grep' check in t7421.4 for mingw compatibility
  submodule: fix style in function definition
  submodule: eliminate unused parameters from print_submodule_summary()
2020-09-09 13:53:07 -07:00
Junio C Hamano eb7460fd31 Merge branch 'es/worktree-repair'
"git worktree" gained a "repair" subcommand to help users recover
after moving the worktrees or repository manually without telling
Git.  Also, "git init --separate-git-dir" no longer corrupts
administrative data related to linked worktrees.

* es/worktree-repair:
  init: make --separate-git-dir work from within linked worktree
  init: teach --separate-git-dir to repair linked worktrees
  worktree: teach "repair" to fix outgoing links to worktrees
  worktree: teach "repair" to fix worktree back-links to main worktree
  worktree: add skeleton "repair" command
2020-09-09 13:53:07 -07:00
Junio C Hamano 1aadb47aad Merge branch 'jk/worktree-check-clean-leakfix'
Leakfix.

* jk/worktree-check-clean-leakfix:
  worktree: fix leak in check_clean_worktree()
2020-09-09 13:53:07 -07:00
Junio C Hamano a31677dde3 Merge branch 'tb/repack-clearing-midx'
When a packfile is removed by "git repack", multi-pack-index gets
cleared; the code was taught to do so less aggressively by first
checking if the midx actually refers to a pack that no longer
exists.

* tb/repack-clearing-midx:
  midx: traverse the local MIDX first
  builtin/repack.c: invalidate MIDX only when necessary
2020-09-09 13:53:06 -07:00
Junio C Hamano bbdba3d883 Merge branch 'ss/submodule-summary-in-c'
Yet another subcommand of "git submodule" is getting rewritten in C.

* ss/submodule-summary-in-c:
  submodule: port submodule subcommand 'summary' from shell to C
  t7421: introduce a test script for verifying 'summary' output
  submodule: rename helper functions to avoid ambiguity
  submodule: remove extra line feeds between callback struct and macro
2020-09-09 13:53:05 -07:00
Taylor Blau ab14d0676c commit-graph: pass a 'struct repository *' in more places
In a future commit, some commit-graph internals will want access to
'r->settings', but we only have the 'struct object_directory *'
corresponding to that repository.

Add an additional parameter to pass the repository around in more
places.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-09 12:51:48 -07:00
Matheus Tavares 378fe5fc3d config: complain about --worktree outside of a git repo
Running `git config --worktree` outside of a git repository hits a BUG()
when trying to enumerate the worktrees. Let's catch this error earlier
and die() with a friendlier message.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-09 12:47:47 -07:00
Jonathan Tan f08cbf60fe index-pack: make quantum of work smaller
Currently, when index-pack resolves deltas, it does not split up delta
trees into threads: each delta base root (an object that is not a
REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
that root (direct or indirect) are processed in the same thread.

This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file.

This patch contains a solution to that. When cloning using

  git -c core.deltabasecachelimit=1g clone \
    https://fuchsia.googlesource.com/third_party/vulkan-cts

on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).

The solution is to have a global work stack. This stack contains delta
bases (objects, whether appearing directly in the packfile or generated
by delta resolution, that themselves have delta children) that need to
be processed; whenever a thread needs work, it peeks at the top of the
stack and processes its next unprocessed child. If a thread finds the
stack empty, it will look for more delta base roots to push on the stack
instead.

The main weakness of having a global work stack is that more time is
spent in the mutex, but profiling has shown that most time is spent in
the resolution of the deltas themselves, so this shouldn't be an issue
in practice. In any case, experimentation (as described in the clone
command above) shows that this patch is a net improvement.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-08 15:52:17 -07:00
Eric Sunshine 07a7f8debf format-patch: use 'origin' as start of current-series-range when known
When formatting a patch series over `origin..HEAD`, one would expect
that range to be used as the current-series-range when computing a
range-diff between the previous and current versions of a patch series.
However, infer_range_diff_ranges() ignores `origin..HEAD` when
--range-diff=<prev> specifies a single revision rather than a range, and
instead unexpectedly computes the current-series-range based upon
<prev>. Address this anomaly by unconditionally using `origin..HEAD` as
the current-series-range regardless of <prev> as long as `origin` is
known, and only fall back to basing current-series-range on <prev> when
`origin` is not known.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-08 15:03:27 -07:00
Eric Sunshine 72a7239016 diff-lib: tighten show_interdiff()'s interface
To compute and show an interdiff, show_interdiff() needs only the two
OID's to compare and a diffopts, yet it expects callers to supply an
entire rev_info. The demand for rev_info is not only overkill, but also
places unnecessary burden on potential future callers which might not
otherwise have a rev_info at hand. Address this by tightening its
signature to require only the items it needs instead of a full rev_info.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-08 15:03:27 -07:00
Eric Sunshine cdffbdc217 diff: move show_interdiff() from its own file to diff-lib
show_interdiff() is a relatively small function and not likely to grow
larger or more complicated. Rather than dedicating an entire source file
to it, relocate it to diff-lib.c which houses other "take two things and
compare them" functions meant to be re-used but not so low-level as to
reside in the core diff implementation.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-08 15:03:26 -07:00
Junio C Hamano 2df2d81ddd add -i: use the built-in version when feature.experimental is set
We have had parallel implementations of "add -i/-p" since 2.25 and
have been using them from various codepaths since 2.26 days, but
never made the built-in version the default.

We have found and fixed a handful of corner case bugs in the
built-in version, and it may be a good time to start switching over
the user base from the scripted version to the built-in version.
Let's enable the built-in version for those who opt into the
feature.experimental guinea-pig program to give wider exposure.

Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-08 14:53:36 -07:00
Eric Sunshine c670aa47df worktree: teach add to recognize -d as shorthand for --detach
Like `git switch` and `git checkout`, `git worktree add` can check out a
branch or set up a detached HEAD. However, unlike those other commands,
`git worktree add` does not understand -d as shorthand for --detach,
which may confound users accustomed to using -d for this purpose.
Address this shortcoming by teaching `add` to recognize -d for --detach,
thus bringing it in line with the other commands.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-06 18:53:56 -07:00
René Scharfe ccb181d0f0 fast-import: use write_pack_header()
Call write_pack_header() to hash and write a pack header instead of
open-coding this function.  This gets rid of duplicate code and of the
magic version number 2 -- which has been used here since c90be46abd
(Changed fast-import's pack header creation to use pack.h, 2006-08-16)
and in pack.h (again) since 29f049a0c2 (Revert "move pack creation to
version 3", 2006-10-14).

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-06 13:40:37 -07:00
René Scharfe 1af8b8c0a5 refspec: add and use refspec_appendf()
Add a function for building a refspec using printf-style formatting.  It
frees callers from managing their own buffer.  Use it throughout the
tree to shorten and simplify its callers.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-06 13:15:46 -07:00
René Scharfe 30035d9c66 push: release strbufs used for refspec formatting
map_refspec() either returns the passed in ref string or a detached
strbuf.  This makes it hard for callers to release the possibly
allocated memory, and set_refspecs() consequently leaks it.

Let map_refspec() append any refspecs directly and release its own
strbufs after use.  Rename it to refspec_append_mapped() and don't
return anything to reflect its increased responsibility.

set_refspecs() also leaks its strbufs.  Do the same here and directly
call refspec_append() in each if branch instead of holding onto a
detached strbuf, then dispose of the allocated memory after use.  We
need to add an else branch for the final call because all the other
conditional branches already add their formatted refspec now.

setup_push_upstream() and setup_push_current() forgot to release their
strbufs as well; plug these leaks, too, while at it.

None of these leaks were likely to impact users, because the number
and sizes of refspecs are usually small and the allocations are only
done once per program run.  Clean them up nevertheless, as another
step on the long road towards zero memory leaks.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-06 13:15:45 -07:00
Orgad Shaneh 7ea0c2f44d fetch: do not look for submodule changes in unchanged refs
When fetching recursively with submodules, for each ref in the
superproject, we call check_for_new_submodule_commits() which collects all
the objects that have to be checked for submodule changes on
calculate_changed_submodule_paths(). On the first call, it also collects all
the existing refs for excluding them from the scan.

calculate_changed_submodule_paths() creates an argument array with all the
collected new objects, followed by --not and all the old objects. This argv
is passed to setup_revisions, which parses each argument, converts it back
to an oid and resolves the object. The parsing itself also does redundant
work, because it is treated like user input, while in fact it is a full
oid. So it needlessly attempts to look it up as ref (checks if it has ^, ~
etc.), checks if it is a file name etc.

For a repository with many refs, all of this is expensive. But if the fetch
in the superproject did not update the ref (i.e. the objects that are
required to exist in the submodule did not change), there is no need to
include it in the list.

Before commit be76c212 (fetch: ensure submodule objects fetched,
2018-12-06), submodule reference changes were only detected for refs that
were changed, but not for new refs. This commit covered also this case, but
what it did was to just include every ref.

This change should reduce the number of scanned refs by about half (except
the case of a no-op fetch, which will not scan any ref), because all the
existing refs will still be listed after --not.

The regression was reported here:
https://public-inbox.org/git/CAGHpTBKSUJzFSWc=uznSu2zB33qCSmKXM-
iAjxRCpqNK5bnhRg@mail.gmail.com/

Signed-off-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-06 09:50:49 -07:00
Junio C Hamano da6b99c39a Merge branch 'hl/bisect-doc-clarify-bad-good-ordering'
Doc update.

* hl/bisect-doc-clarify-bad-good-ordering:
  bisect: swap command-line options in documentation
2020-09-03 12:37:06 -07:00
Junio C Hamano b4100f366c Merge branch 'jt/lazy-fetch'
Updates to on-demand fetching code in lazily cloned repositories.

* jt/lazy-fetch:
  fetch: no FETCH_HEAD display if --no-write-fetch-head
  fetch-pack: remove no_dependents code
  promisor-remote: lazy-fetch objects in subprocess
  fetch-pack: do not lazy-fetch during ref iteration
  fetch: only populate existing_refs if needed
  fetch: avoid reading submodule config until needed
  fetch: allow refspecs specified through stdin
  negotiator/noop: add noop fetch negotiator
2020-09-03 12:37:04 -07:00
Junio C Hamano 18aff08e04 Merge branch 'jc/undash-in-tree-git-callers'
A handful of places in in-tree code still relied on being able to
execute the git subcommands, especially built-ins, in "git-foo"
form, which have been corrected.

* jc/undash-in-tree-git-callers:
  credential-cache: use child_process.args
  cvsexportcommit: do not run git programs in dashed form
  transport-helper: do not run git-remote-ext etc. in dashed form
2020-09-03 12:37:03 -07:00
Junio C Hamano afd49c39dd Merge branch 'jk/slimmed-down'
Trim an unused binary and turn a bunch of commands into built-in.

* jk/slimmed-down:
  drop vcs-svn experiment
  make git-fast-import a builtin
  make git-bugreport a builtin
  make credential helpers builtins
  Makefile: drop builtins from MSVC pdb list
2020-09-03 12:37:02 -07:00
Junio C Hamano 9c31b19dd0 Merge branch 'pw/rebase-i-more-options'
"git rebase -i" learns a bit more options.

* pw/rebase-i-more-options:
  t3436: do not run git-merge-recursive in dashed form
  rebase: add --reset-author-date
  rebase -i: support --ignore-date
  rebase -i: support --committer-date-is-author-date
  am: stop exporting GIT_COMMITTER_DATE
  rebase -i: add --ignore-whitespace flag
2020-09-03 12:37:01 -07:00
Jonathan Tan f24c30e0b6 wt-status: tolerate dangling marks
When a user checks out the upstream branch of HEAD, the upstream branch
not being a local branch, and then runs "git status", like this:

  git clone $URL client
  cd client
  git checkout @{u}
  git status

no status is printed, but instead an error message:

  fatal: HEAD does not point to a branch

(This error message when running "git branch" persists even after
checking out other things - it only stops after checking out a branch.)

This is because "git status" reads the reflog when determining the "HEAD
detached" message, and thus attempts to DWIM "@{u}", but that doesn't
work because HEAD no longer points to a branch.

Therefore, when calculating the status of a worktree, tolerate dangling
marks. This is done by adding an additional parameter to
dwim_ref() and repo_dwim_ref().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-02 14:39:25 -07:00
Jonathan Tan db3c293ecd fetch: no FETCH_HEAD display if --no-write-fetch-head
887952b8c6 ("fetch: optionally allow disabling FETCH_HEAD update",
2020-08-18) introduced the ability to disable writing to FETCH_HEAD
during fetch, but did not suppress the "<source> -> FETCH_HEAD" message
when this ability is used. This message is misleading in this case,
because FETCH_HEAD is not written. Also, because "fetch" is used to
lazy-fetch missing objects in a partial clone, this significantly
clutters up the output in that case since the objects to be fetched are
potentially numerous.

Therefore, suppress this message when --no-write-fetch-head is passed
(but not when --dry-run is set).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-02 14:26:55 -07:00
Junio C Hamano c57afd73ef Merge branch 'rs/checkout-no-overlay-pathspec-fix'
"git restore/checkout --no-overlay" with wildcarded pathspec
mistakenly removed matching paths in subdirectories, which has been
corrected.

* rs/checkout-no-overlay-pathspec-fix:
  checkout, restore: make pathspec recursive
2020-08-31 15:49:50 -07:00
Junio C Hamano cca424ba90 Merge branch 'jk/refspecs-cleanup'
Preliminary code clean-up before introducing "negative refspec".

* jk/refspecs-cleanup:
  refspec: make sure stack refspec_item variables are zeroed
  refspec: fix documentation referring to refspec_item
2020-08-31 15:49:48 -07:00
Junio C Hamano e699684cf6 Merge branch 'hn/refs-pseudorefs'
Accesses to two pseudorefs have been updated to properly use ref
API.

* hn/refs-pseudorefs:
  sequencer: treat REVERT_HEAD as a pseudo ref
  builtin/commit: suggest update-ref for pseudoref removal
  sequencer: treat CHERRY_PICK_HEAD as a pseudo ref
  refs: make refs_ref_exists public
2020-08-31 15:49:48 -07:00
Junio C Hamano 53015c9dd4 Merge branch 'jk/index-pack-w-more-threads'
Long ago, we decided to use 3 threads by default when running the
index-pack task in parallel, which has been adjusted a bit upwards.

* jk/index-pack-w-more-threads:
  index-pack: adjust default threading cap
  p5302: count up to online-cpus for thread tests
  p5302: disable thread-count parameter tests by default
2020-08-31 15:49:48 -07:00
Eric Sunshine 59d876ccd6 init: make --separate-git-dir work from within linked worktree
The intention of `git init --separate-work-dir=<path>` is to move the
.git/ directory to a location outside of the main worktree. When used
within a linked worktree, however, rather than moving the .git/
directory as intended, it instead incorrectly moves the worktree's
.git/worktrees/<id> directory to <path>, thus disconnecting the linked
worktree from its parent repository and breaking the worktree in the
process since its local .git file no longer points at a location at
which it can find the object database. Fix this broken behavior.

An intentional side-effect of this change is that it also closes a
loophole not caught by ccf236a23a (init: disallow --separate-git-dir
with bare repository, 2020-08-09) in which the check to prevent
--separate-git-dir being used in conjunction with a bare repository was
unable to detect the invalid combination when invoked from within a
linked worktree. Therefore, add a test to verify that this loophole is
closed, as well.

Reported-by: Henré Botha <henrebotha@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-31 11:47:45 -07:00
Eric Sunshine 42264bc841 init: teach --separate-git-dir to repair linked worktrees
A linked worktree's .git file is a "gitfile" pointing at the
.git/worktrees/<id> directory within the repository. When `git init
--separate-git-dir=<path>` is used on an existing repository to relocate
the repository's .git/ directory to a different location, it neglects to
update the .git files of linked worktrees, thus breaking the worktrees
by making it impossible for them to locate the repository. Fix this by
teaching --separate-git-dir to repair the .git file of each linked
worktree to point at the new repository location.

Reported-by: Henré Botha <henrebotha@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-31 11:47:45 -07:00
Eric Sunshine b214ab5aa5 worktree: teach "repair" to fix outgoing links to worktrees
The .git/worktrees/<id>/gitdir file points at the location of a linked
worktree's .git file. Its content must be of the form
/path/to/worktree/.git (from which the location of the worktree itself
can be derived by stripping the "/.git" suffix). If the gitdir file is
deleted or becomes corrupted or outdated, then Git will be unable to
find the linked worktree. An easy way for the gitdir file to become
outdated is for the user to move the worktree manually (without using
"git worktree move"). Although it is possible to manually update the
gitdir file to reflect the new linked worktree location, doing so
requires a level of knowledge about worktree internals beyond what a
user should be expected to know offhand.

Therefore, teach "git worktree repair" how to repair broken or outdated
.git/worktrees/<id>/gitdir files automatically. (For this to work, the
command must either be invoked from within the worktree whose gitdir
file requires repair, or from within the main or any linked worktree by
providing the path of the broken worktree as an argument to "git
worktree repair".)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-31 11:47:45 -07:00
Eric Sunshine bdd1f3e4da worktree: teach "repair" to fix worktree back-links to main worktree
The .git file in a linked worktree is a "gitfile" which points back to
the .git/worktrees/<id> entry in the main worktree or bare repository.
If a worktree's .git file is deleted or becomes corrupted or outdated,
then the linked worktree won't know how to find the repository or any of
its own administrative files (such as 'index', 'HEAD', etc.). An easy
way for the .git file to become outdated is for the user to move the
main worktree or bare repository. Although it is possible to manually
update each linked worktree's .git file to reflect the new repository
location, doing so requires a level of knowledge about worktree
internals beyond what a user should be expected to know offhand.

Therefore, teach "git worktree repair" how to repair broken or outdated
worktree .git files automatically. (For this to work, the command must
be invoked from within the main worktree or bare repository, or from
within a worktree which has not become disconnected from the repository
-- such as one which was created after the repository was moved.)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-31 11:47:45 -07:00
Miriam Rubio 7b4de74b5d bisect--helper: introduce new write_in_file() function
Let's refactor code adding a new `write_in_file()` function
that opens a file for writing a message and closes it and a
wrapper for writing mode.

This helper will be used in later steps and makes the code
simpler and easier to understand.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 16:21:16 -07:00
Miriam Rubio 30276765c1 bisect--helper: use '-res' in 'cmd_bisect__helper' return
Following 'enum bisect_error' vocabulary, return variable 'res' is
always non-positive.
Let's use '-res' instead of 'abs(res)' to make the code clearer.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 16:21:16 -07:00
Miriam Rubio ef5aef5ee0 bisect--helper: BUG() in cmd_*() on invalid subcommand
In cmd_bisect__helper() function, if an invalid or no
subcommand is passed there is a BUG.

BUG() out instead of returning an error.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 16:21:15 -07:00
Taylor Blau 59552fb3e2 midx: traverse the local MIDX first
When a repository has an alternate object directory configured, callers
can traverse through each alternate's MIDX by walking the '->next'
pointer.

But, when 'prepare_multi_pack_index_one()' loads multiple MIDXs, it
places the new ones at the front of this pointer chain, not at the end.
This can be confusing for callers such as 'git repack -ad', causing test
failures like in t7700.6 with 'GIT_TEST_MULTI_PACK_INDEX=1'.

The occurs when dropping a pack known to the local MIDX with alternates
configured that have their own MIDX. Since the alternate's MIDX is
returned via 'get_multi_pack_index()', 'midx_contains_pack()' returns
true (which is correct, since it traverses through the '->next' pointer
to find the MIDX in the chain that does contain the requested object).
But, we call 'clear_midx_file()' on 'the_repository', which drops the
MIDX at the path of the first MIDX in the chain, which (in the case of
t7700.6 is the one in the alternate).

This patch addresses that by:

  - placing the local MIDX first in the chain when calling
    'prepare_multi_pack_index_one()', and

  - introducing a new 'get_local_multi_pack_index()', which explicitly
    returns the repository-local MIDX, if any.

Don't impose an additional order on the MIDX's '->next' pointer beyond
that the first item in the chain must be local if one exists so that we
avoid a quadratic insertion.

Likewise, use 'get_local_multi_pack_index()' in
'remove_redundant_pack()' to fix the formerly broken t7700.6 when run
with 'GIT_TEST_MULTI_PACK_INDEX=1'.

Finally, note that the MIDX ordering invariant is only preserved by the
insertion order in 'prepare_packed_git()', which traverses through the
ODB's '->next' pointer, meaning we visit the local object store first.
This fragility makes this an undesirable long-term solution if more
callers are added, but it is acceptable for now since this is the only
caller.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 14:07:09 -07:00
Hugo Locurcio ef4d9f8a32 bisect: swap command-line options in documentation
The positional arguments are specified in this order: "bad" then "good".
To avoid confusion, the options above the positional arguments
are now specified in the same order. They can still be specified in any
order since they're options, not positional arguments.

Signed-off-by: Hugo Locurcio <hugo.locurcio@hugo.pro>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 14:06:06 -07:00
Junio C Hamano 0d9a8e33f9 Merge branch 'jk/leakfix'
Code clean-up.

* jk/leakfix:
  submodule--helper: fix leak of core.worktree value
  config: fix leak in git_config_get_expiry_in_days()
  config: drop git_config_get_string_const()
  config: fix leaks from git_config_get_string_const()
  checkout: fix leak of non-existent branch names
  submodule--helper: use strbuf_release() to free strbufs
  clear_pattern_list(): clear embedded hashmaps
2020-08-27 14:04:49 -07:00
Jiang Xin 31e8595a11 receive-pack: new config receive.procReceiveRefs
Add a new multi-valued config variable "receive.procReceiveRefs"
for `receive-pack` command, like the follows:

    git config --system --add receive.procReceiveRefs refs/for
    git config --system --add receive.procReceiveRefs refs/drafts

If the specific prefix strings given by the config variables match the
reference names of the commands which are sent from git client to
`receive-pack`, these commands will be executed by an external hook
(named "proc-receive"), instead of the internal `execute_commands`
function.

For example, if it is set to "refs/for", pushing to a reference such as
"refs/for/master" will not create or update reference "refs/for/master",
but may create or update a pull request directly by running the hook
"proc-receive".

Optional modifiers can be provided in the beginning of the value to
filter commands for specific actions: create (a), modify (m),
delete (d). A `!` can be included in the modifiers to negate the
reference prefix entry. E.g.:

    git config --system --add receive.procReceiveRefs ad:refs/heads
    git config --system --add receive.procReceiveRefs !:refs/heads

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27 12:47:47 -07:00
Jiang Xin 63518a574a New capability "report-status-v2" for git-push
The new introduced "proc-receive" hook may handle a command for a
pseudo-reference with a zero-old as its old-oid, while the hook may
create or update a reference with different name, different new-oid,
and different old-oid (the reference may exist already with a non-zero
old-oid).  Current "report-status" protocol cannot report the status for
such reference rewrite.

Add new capability "report-status-v2" and new report protocol which is
not backward compatible for report of git-push.

If a user pushes to a pseudo-reference "refs/for/master/topic", and
"receive-pack" creates two new references "refs/changes/23/123/1" and
"refs/changes/24/124/1", for client without the knowledge of
"report-status-v2", "receive-pack" will only send "ok/ng" directives in
the report, such as:

    ok ref/for/master/topic

But for client which has the knowledge of "report-status-v2",
"receive-pack" will use "option" directives to report more attributes
for the reference given by the above "ok/ng" directive.

    ok refs/for/master/topic
    option refname refs/changes/23/123/1
    option new-oid <new-oid>
    ok refs/for/master/topic
    option refname refs/changes/24/124/1
    option new-oid <new-oid>

The client will report two new created references to the end user.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27 12:47:47 -07:00
Jiang Xin 195d6eaea3 receive-pack: feed report options to post-receive
When commands are fed to the "post-receive" hook, report options will
be parsed and the real old-oid, new-oid, reference name will feed to
the "post-receive" hook.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27 12:47:47 -07:00
Jiang Xin 15d3af5e22 receive-pack: add new proc-receive hook
Git calls an internal `execute_commands` function to handle commands
sent from client to `git-receive-pack`.  Regardless of what references
the user pushes, git creates or updates the corresponding references if
the user has write-permission.  A contributor who has no
write-permission, cannot push to the repository directly.  So, the
contributor has to write commits to an alternate location, and sends
pull request by emails or by other ways.  We call this workflow as a
distributed workflow.

It would be more convenient to work in a centralized workflow like what
Gerrit provided for some cases.  For example, a read-only user who
cannot push to a branch directly can run the following `git push`
command to push commits to a pseudo reference (has a prefix "refs/for/",
not "refs/heads/") to create a code review.

    git push origin \
        HEAD:refs/for/<branch-name>/<session>

The `<branch-name>` in the above example can be as simple as "master",
or a more complicated branch name like "foo/bar".  The `<session>` in
the above example command can be the local branch name of the client
side, such as "my/topic".

We cannot implement a centralized workflow elegantly by using
"pre-receive" + "post-receive", because Git will call the internal
function "execute_commands" to create references (even the special
pseudo reference) between these two hooks.  Even though we can delete
the temporarily created pseudo reference via the "post-receive" hook,
having a temporary reference is not safe for concurrent pushes.

So, add a filter and a new handler to support this kind of workflow.
The filter will check the prefix of the reference name, and if the
command has a special reference name, the filter will turn a specific
field (`run_proc_receive`) on for the command.  Commands with this filed
turned on will be executed by a new handler (a hook named
"proc-receive") instead of the internal `execute_commands` function.
We can use this "proc-receive" command to create pull requests or send
emails for code review.

Suggested by Junio, this "proc-receive" hook reads the commands,
push-options (optional), and send result using a protocol in pkt-line
format.  In the following example, the letter "S" stands for
"receive-pack" and letter "H" stands for the hook.

    # Version and features negotiation.
    S: PKT-LINE(version=1\0push-options atomic...)
    S: flush-pkt
    H: PKT-LINE(version=1\0push-options...)
    H: flush-pkt

    # Send commands from server to the hook.
    S: PKT-LINE(<old-oid> <new-oid> <ref>)
    S: ... ...
    S: flush-pkt
    # Send push-options only if the 'push-options' feature is enabled.
    S: PKT-LINE(push-option)
    S: ... ...
    S: flush-pkt

    # Receive result from the hook.
    # OK, run this command successfully.
    H: PKT-LINE(ok <ref>)
    # NO, I reject it.
    H: PKT-LINE(ng <ref> <reason>)
    # Fall through, let 'receive-pack' to execute it.
    H: PKT-LINE(ok <ref>)
    H: PKT-LINE(option fall-through)
    # OK, but has an alternate reference.  The alternate reference name
    # and other status can be given in options
    H: PKT-LINE(ok <ref>)
    H: PKT-LINE(option refname <refname>)
    H: PKT-LINE(option old-oid <old-oid>)
    H: PKT-LINE(option new-oid <new-oid>)
    H: PKT-LINE(option forced-update)
    H: ... ...
    H: flush-pkt

After receiving a command, the hook will execute the command, and may
create/update different reference.  For example, a command for a pseudo
reference "refs/for/master/topic" may create/update different reference
such as "refs/pull/123/head".  The alternate reference name and other
status are given in option lines.

The list of commands returned from "proc-receive" will replace the
relevant commands that are sent from user to "receive-pack", and
"receive-pack" will continue to run the "execute_commands" function and
other routines.  Finally, the result of the execution of these commands
will be reported to end user.

The reporting function from "receive-pack" to "send-pack" will be
extended in latter commit just like what the "proc-receive" hook reports
to "receive-pack".

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27 12:47:47 -07:00
Shourya Shukla d79b145569 t7421: eliminate 'grep' check in t7421.4 for mingw compatibility
The 'grep' check in test 4 of t7421 resulted in the failure of t7421 on
Windows due to a different error message

    error: cannot spawn git: No such file or directory

instead of

    fatal: exec 'rev-parse': cd to 'my-subm' failed: No such file or directory

Tighten up the check to compute 'src_abbrev' by guarding the
'verify_submodule_committish()' call using `p->status !='D'`, so that
the former isn't called in case of non-existent submodule directory,
consequently, there is no such error message on any execution
environment. The same need not be implemented for 'dst_abbrev' and is
rather redundant since the conditional 'if (S_ISGITLINK(p->mod_dst))'
already guards the 'verify_submodule_committish()' when we have a
status of 'D'.

Therefore, eliminate the 'grep' check in t7421. Instead, verify the
absence of an error message by doing a 'test_must_be_empty' on the
file containing the error.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27 11:47:10 -07:00
Eric Sunshine e8e1ff24c5 worktree: add skeleton "repair" command
Worktree administrative files can become corrupted or outdated due to
external factors. Although, it is often possible to recover from such
situations by hand-tweaking these files, doing so requires intimate
knowledge of worktree internals. While information necessary to make
such repairs manually can be obtained from git-worktree.txt and
gitrepository-layout.txt, we can assist users more directly by teaching
git-worktree how to repair its administrative files itself (at least to
some extent). Therefore, add a "git worktree repair" command which
attempts to correct common problems which may arise due to factors
beyond Git's control.

At this stage, the "repair" command is a mere skeleton; subsequent
commits will flesh out the functionality.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27 08:59:13 -07:00
Jeff King 27ed6ccc12 worktree: fix leak in check_clean_worktree()
We allocate a child_env strvec but never free its memory. Instead, let's
just use the strvec that our child_process struct provides, which is
cleaned up automatically when we run the command.

And while we're moving the initialization of the child_process around,
let's switch it to use the official init function (zero-initializing it
works OK, since strvec is happy enough with that, but it sets a bad
example).

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-27 08:30:17 -07:00
Taylor Blau e08f7bb093 builtin/repack.c: invalidate MIDX only when necessary
In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
learned to remove a multi-pack-index file if it added or removed a pack
from the object store.

This mechanism is a little over-eager, since it is only necessary to
drop a MIDX if 'git repack' removes a pack that the MIDX references.
Adding a pack outside of the MIDX does not require invalidating the
MIDX, and likewise for removing a pack the MIDX does not know about.

Teach 'git repack' to check for this by loading the MIDX, and checking
whether the to-be-removed pack is known to the MIDX. This requires a
slightly odd alternation to a test in t5319, which is explained with a
comment. A new test is added to show that the MIDX is left alone when
both packs known to it are marked as .keep, but two packs unknown to it
are removed and combined into one new pack.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-26 13:55:46 -07:00
Shourya Shukla f0c6b6467d submodule: fix style in function definition
The definitions of 'verify_submodule_committish()' and
'print_submodule_summary()' had wrong styling in terms of the asterisk
placement. Amend them.

Also, the warning printed in case of an unexpected file mode printed the
mode in decimal. Print it in octal for enhanced readability.

Reported-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-25 13:43:21 -07:00
Shourya Shukla e0f7ae564e submodule: eliminate unused parameters from print_submodule_summary()
Eliminate the parameters 'missing_{src,dst}' from the
'print_submodule_summary()' function call since they are not used
anywhere in the function.

Reported-by: Jeff King <peff@peff.net>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-25 13:43:10 -07:00
Junio C Hamano ad00f44f54 Merge branch 'en/dir-clear'
Leakfix with code clean-up.

* en/dir-clear:
  dir: fix problematic API to avoid memory leaks
  dir: make clear_directory() free all relevant memory
2020-08-24 14:54:34 -07:00
Junio C Hamano b556050733 Merge branch 'jc/no-update-fetch-head'
"git fetch" learned --no-write-fetch-head option to avoid writing
the FETCH_HEAD file.

* jc/no-update-fetch-head:
  fetch: optionally allow disabling FETCH_HEAD update
2020-08-24 14:54:31 -07:00
Junio C Hamano ff20794402 Merge branch 'jk/unleak-fixes'
Fix some incorrect UNLEAK() annotations.

* jk/unleak-fixes:
  ls-remote: simplify UNLEAK() usage
  stop calling UNLEAK() before die()
2020-08-24 14:54:30 -07:00
Junio C Hamano a654836d96 Merge branch 'es/init-no-separate-git-dir-in-bare'
The purpose of "git init --separate-git-dir" is to initialize a
new project with the repository separate from the working tree,
or, in the case of an existing project, to move the repository
(the .git/ directory) out of the working tree. It does not make
sense to use --separate-git-dir with a bare repository for which
there is no working tree, so disallow its use with bare
repositories.

* es/init-no-separate-git-dir-in-bare:
  init: disallow --separate-git-dir with bare repository
2020-08-24 14:54:28 -07:00
Jonathan Tan ee6f058384 index-pack: make resolve_delta() assume base data
A subsequent commit will make the quantum of work smaller, necessitating
more locking. This commit allows resolve_delta() to be called outside
the lock.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 14:14:52 -07:00
Jonathan Tan b4718cae51 index-pack: calculate {ref,ofs}_{first,last} early
This is refactoring 2 of 2 to simplify struct base_data.

Whenever we make a struct base_data, immediately calculate its delta
children. This eliminates confusion as to when the
{ref,ofs}_{first,last} fields are initialized.

Before this patch, the delta children were calculated at the last
possible moment. This allowed the members of struct base_data to be
populated in any order, superficially useful when we have the object
contents before the struct object_entry. But this makes reasoning about
the state of struct base_data more complicated, hence this patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 14:12:58 -07:00
Jonathan Tan a7f7e84a49 index-pack: remove redundant child field
This is refactoring 1 of 2 to simplify struct base_data.

In index-pack, each thread maintains a doubly-linked list of the delta
chain that it is currently processing (the "base" and "child" pointers
in struct base_data). When a thread exceeds the delta base cache limit
and needs to reclaim memory, it uses the "child" pointers to traverse
the lineage, reclaiming the memory of the eldest delta bases first.

A subsequent patch will perform memory reclaiming in a different way and
will thus no longer need the "child" pointer. Because the "child"
pointer is redundant even now, remove it so that the aforementioned
subsequent patch will be clearer. In the meantime, reclaim memory in the
reverse order of the "base" pointers.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 14:11:14 -07:00
Jonathan Tan 46e6fb1e44 index-pack: unify threaded and unthreaded code
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 14:02:31 -07:00
Jonathan Tan fc968e26c2 index-pack: remove redundant parameter
find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
the object type is already present in the name of the function. Remove
that parameter from these functions.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 13:55:57 -07:00
René Scharfe bfda204ade checkout, restore: make pathspec recursive
The pathspec given to git checkout and git restore is used with both
tree_entry_interesting (via read_tree_recursive) and match_pathspec
(via ce_path_match).  The latter effectively only supports recursive
matching regardless of the value of the pathspec flag "recursive",
which is unset here.

That causes different match results for pathspecs with wildcards, and
can lead checkout and restore in no-overlay mode to remove entries
instead of modifying them.  Enable recursive matching for both checkout
and restore to make matching consistent.

Setting the flag in checkout_main() technically also affects git switch,
but since that command doesn't accept pathspecs at all this has no
actual consequence.

Reported-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Initial-test-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-22 13:37:43 -07:00
Jeff King fbff95b67f index-pack: adjust default threading cap
Commit b8a2486f15 (index-pack: support multithreaded delta resolving,
2012-05-06) describes an experiment that shows that setting the number
of threads for index-pack higher than 3 does not help.

I repeated that experiment using a more modern version of Git and a more
modern CPU and got different results.

Here are timings for p5302 against linux.git run on my laptop, a Core
i9-9880H with 8 cores plus hyperthreading (so online-cpus returns 16):

  5302.3: index-pack 0 threads                   256.28(253.41+2.79)
  5302.4: index-pack 1 threads                   257.03(254.03+2.91)
  5302.5: index-pack 2 threads                   149.39(268.34+3.06)
  5302.6: index-pack 4 threads                   94.96(294.10+3.23)
  5302.7: index-pack 8 threads                   68.12(339.26+3.89)
  5302.8: index-pack 16 threads                  70.90(655.03+7.21)
  5302.9: index-pack default number of threads   116.91(290.05+3.21)

You can see that wall-clock times continue to improve dramatically up to
the number of cores, but bumping beyond that (into hyperthreading
territory) does not help (and in fact hurts a little).

Here's the same experiment on a machine with dual Xeon 6230's, totaling
40 cores (80 with hyperthreading):

  5302.3: index-pack 0 threads                    310.04(302.73+6.90)
  5302.4: index-pack 1 threads                    310.55(302.68+7.40)
  5302.5: index-pack 2 threads                    178.17(304.89+8.20)
  5302.6: index-pack 5 threads                    99.53(315.54+9.56)
  5302.7: index-pack 10 threads                   72.80(327.37+12.79)
  5302.8: index-pack 20 threads                   60.68(357.74+21.66)
  5302.9: index-pack 40 threads                   58.07(454.44+67.96)
  5302.10: index-pack 80 threads                  59.81(720.45+334.52)
  5302.11: index-pack default number of threads   134.18(309.32+7.98)

The results are similar; things stop improving at 40 threads. Curiously,
going from 20 to 40 really doesn't help much, either (and increases CPU
time considerably). So that may represent an actual barrier to
parallelism, where we lose out due to context-switching and loss of
cache locality, but don't reap the wall-clock benefits due to contention
of our coarse-grained locks.

So what's a good default value? It's clear that the current cap of 3 is
too low; our default values are 42% and 57% slower than the best times
on each machine. The results on the 40-core machine imply that 20
threads is an actual barrier regardless of the number of cores, so we'll
take that as a maximum. We get the best results on these machines at
half of the online-cpus value. That's presumably a result of the
hyperthreading. That's common on multi-core Intel processors, but not
necessarily elsewhere. But if we take it as an assumption, we can
perform optimally on hyperthreaded machines and still do much better
than the status quo on other machines, as long as we never half below
the current value of 3.

So that's what this patch does.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21 12:02:36 -07:00
Han-Wen Nienhuys b6d2558c9e builtin/commit: suggest update-ref for pseudoref removal
When pseudorefs move to a different ref storage mechanism, pseudorefs no longer
can be removed with 'rm'. Instead, suggest a "update-ref -d" command, which will
work regardless of ref storage backend.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21 11:20:10 -07:00
Han-Wen Nienhuys c8e4159efd sequencer: treat CHERRY_PICK_HEAD as a pseudo ref
Check for existence and delete CHERRY_PICK_HEAD through ref functions.
This will help cherry-pick work with alternate ref storage backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21 11:20:10 -07:00
Junio C Hamano 2a978f8273 Merge branch 'jc/object-names-are-not-sha-1'
A few end-user facing messages have been updated to be
hash-algorithm agnostic.

* jc/object-names-are-not-sha-1:
  messages: avoid SHA-1 in end-user facing messages
2020-08-19 16:14:52 -07:00
Rohit Ashiwal 27126692ba rebase: add --reset-author-date
The previous commit introduced --ignore-date flag to rebase -i, but the
name is rather vague as it does not say whether the author date or the
committer date is ignored. Add an alias to convey the precise purpose.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19 15:22:56 -07:00
Phillip Wood a3894aad67 rebase -i: support --ignore-date
Rebase is implemented with two different backends - 'apply' and
'merge' each of which support a different set of options. In
particular the apply backend supports a number of options implemented
by 'git am' that are not implemented in the merge backend. This means
that the available options are different depending on which backend is
used which is confusing. This patch adds support for the --ignore-date
option to the merge backend. This option uses the current time as the
author date rather than reusing the original author date when
rewriting commits. We take care to handle the combination of
--ignore-date and --committer-date-is-author-date in the same way as
the apply backend.

Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19 15:19:59 -07:00
Elijah Newren eceba53214 dir: fix problematic API to avoid memory leaks
The dir structure seemed to have a number of leaks and problems around
it.  First I noticed that parent_hashmap and recursive_hashmap were
being leaked (though Peff noticed and submitted fixes before me).  Then
I noticed in the previous commit that clear_directory() was only taking
responsibility for a subset of fields within dir_struct, despite the
fact that entries[] and ignored[] we allocated internally to dir.c.
That, of course, resulted in many callers either leaking or haphazardly
trying to free these arrays and their contents.

Digging further, I found that despite the pretty clear documentation
near the top of dir.h that folks were supposed to call clear_directory()
when the user no longer needed the dir_struct, there were four callers
that didn't bother doing that at all.  However, two of them clearly
thought about leaks since they had an UNLEAK(dir) directive, which to me
suggests that the method to free the data was too unclear.  I suspect
the non-obviousness of the API and its holes led folks to avoid it,
which then snowballed into further problems with the entries[],
ignored[], parent_hashmap, and recursive_hashmap problems.

Rename clear_directory() to dir_clear() to be more in line with other
data structures in git, and introduce a dir_init() to handle the
suggested memsetting of dir_struct to all zeroes.  I hope that a name
like "dir_clear()" is more clear, and that the presence of dir_init()
will provide a hint to those looking at the code that they need to look
for either a dir_clear() or a dir_free() and lead them to find
dir_clear().

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18 17:17:31 -07:00
Elijah Newren dad4f23ce5 dir: make clear_directory() free all relevant memory
The calling convention for the dir API is supposed to end with a call to
clear_directory() to free up no longer needed memory.  However,
clear_directory() didn't free dir->entries or dir->ignored.  I believe
this was an oversight, but a number of callers noticed memory leaks and
started free'ing these.  Unfortunately, they did so somewhat haphazardly
(sometimes freeing the entries in the arrays, and sometimes only
free'ing the arrays themselves).  This suggests the callers weren't
trying to make sure any possible memory used might be free'd, but just
the memory they noticed their usecase definitely had allocated.

Fix this mess by moving all the duplicated free'ing logic into
clear_directory().  End by resetting dir to a pristine state so it could
be reused if desired.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18 17:17:29 -07:00
Jonathan Tan 9dfa8dbeee fetch-pack: remove no_dependents code
Now that Git has switched to using a subprocess to lazy-fetch missing
objects, remove the no_dependents code as it is no longer used.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18 16:46:53 -07:00
Jonathan Tan abcb7eeb31 fetch: only populate existing_refs if needed
In "fetch", get_ref_map() iterates over all refs to populate
"existing_refs" in order to populate peer_ref->old_oid in the returned
refmap, even if the refmap has no peer_ref set - which is the case when
only literal hashes (i.e. no refs by name) are fetched.

Iterating over refs causes the targets of those refs to be checked for
existence. Avoiding this is especially important when we use "git fetch"
to perform lazy fetches in a partial clone because a target of such a
ref may need to be itself lazy-fetched (and otherwise causing an
infinite loop).

Therefore, avoid populating "existing_refs" until necessary. With this
patch, because Git lazy-fetches objects by literal hashes (to be done in
a subsequent commit), it will then be able to guarantee avoiding reading
targets of refs.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18 13:25:05 -07:00
Jonathan Tan e5b942136e fetch: avoid reading submodule config until needed
In "fetch", there are two parameters submodule_fetch_jobs_config and
recurse_submodules that can be set in a variety of ways: through
.gitmodules, through .git/config, and through the command line.
Currently "fetch" handles this by first reading .gitmodules, then
reading .git/config (allowing it to overwrite existing values), then
reading the command line (allowing it to overwrite existing values).

Notice that we can avoid reading .gitmodules if .git/config and/or the
command line already provides us with what we need. In addition, if
recurse_submodules is found to be "no", we do not need the value of
submodule_fetch_jobs_config.

Avoiding reading .gitmodules is especially important when we use "git
fetch" to perform lazy fetches in a partial clone because the
.gitmodules file itself might need to be lazy fetched (and otherwise
causing an infinite loop).

In light of all this, avoid reading .gitmodules until necessary. When
reading it, we may only need one of the two parameters it provides, so
teach fetch_config_from_gitmodules() to support NULL arguments. With
this patch, users (including Git itself when invoking "git fetch" to
lazy-fetch) will be able to guarantee avoiding reading .gitmodules by
passing --recurse-submodules=no.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18 13:25:05 -07:00
Jonathan Tan 2b713c272c fetch: allow refspecs specified through stdin
In a subsequent patch, partial clones will be taught to fetch missing
objects using a "git fetch" subprocess. Because the number of objects
fetched may be too numerous to fit on the command line, teach "fetch" to
accept refspecs passed through stdin.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18 13:25:05 -07:00
Junio C Hamano 887952b8c6 fetch: optionally allow disabling FETCH_HEAD update
If you run fetch but record the result in remote-tracking branches,
and either if you do nothing with the fetched refs (e.g. you are
merely mirroring) or if you always work from the remote-tracking
refs (e.g. you fetch and then merge origin/branchname separately),
you can get away with having no FETCH_HEAD at all.

Teach "git fetch" a command line option "--[no-]write-fetch-head".
The default is to write FETCH_HEAD, and the option is primarily
meant to be used with the "--no-" prefix to override this default,
because there is no matching fetch.writeFetchHEAD configuration
variable to flip the default to off (in which case, the positive
form may become necessary to defeat it).

Note that under "--dry-run" mode, FETCH_HEAD is never written;
otherwise you'd see list of objects in the file that you do not
actually have.  Passing `--write-fetch-head` does not force `git
fetch` to write the file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18 12:56:57 -07:00
Junio C Hamano eca8c62a50 Merge branch 'jk/log-fp-implies-m'
"git log --first-parent -p" showed patches only for single-parent
commits on the first-parent chain; the "--first-parent" option has
been made to imply "-m".  Use "--no-diff-merges" to restore the
previous behaviour to omit patches for merge commits.

* jk/log-fp-implies-m:
  doc/git-log: clarify handling of merge commit diffs
  doc/git-log: move "-t" into diff-options list
  doc/git-log: drop "-r" diff option
  doc/git-log: move "Diff Formatting" from rev-list-options
  log: enable "-m" automatically with "--first-parent"
  revision: add "--no-diff-merges" option to counteract "-m"
  log: drop "--cc implies -m" logic
2020-08-17 17:02:49 -07:00
Junio C Hamano 47f0f94bc7 Merge branch 'al/bisect-first-parent'
"git bisect" learns the "--first-parent" option to find the first
breakage along the first-parent chain.

* al/bisect-first-parent:
  bisect: combine args passed to find_bisection()
  bisect: introduce first-parent flag
  cmd_bisect__helper: defer parsing no-checkout flag
  rev-list: allow bisect and first-parent flags
  t6030: modernize "git bisect run" tests
2020-08-17 17:02:45 -07:00
Jeff King 55fe225dde submodule--helper: fix leak of core.worktree value
In the ensure_core_worktree() function, we load the core.worktree value
of the submodule repository using repo_config_get_string(). This
function copies the string, but we never free it, leaking the memory.

We can instead use the "tmp" version of that function to avoid the
allocation at all. We don't have to worry about lifetime issues, since
we never even look at the value (we just want to know if it's set).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 15:35:47 -07:00
Phillip Wood 7573cec52c rebase -i: support --committer-date-is-author-date
Rebase is implemented with two different backends - 'apply' and
'merge' each of which support a different set of options. In
particular the apply backend supports a number of options implemented
by 'git am' that are not implemented in the merge backend. This means
that the available options are different depending on which backend is
used which is confusing. This patch adds support for the
--committer-date-is-author-date option to the merge backend. This
option uses the author date of the commit that is being rewritten as
the committer date when the new commit is created.

Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 11:58:37 -07:00
Phillip Wood e8cbe2118a am: stop exporting GIT_COMMITTER_DATE
The implementation of --committer-date-is-author-date exports
GIT_COMMITTER_DATE to override the default committer date but does not
reset GIT_COMMITTER_DATE in the environment after creating the commit
so it is set in the environment of any hooks that get run. We're about
to add the same functionality to the sequencer and do not want to have
GIT_COMMITTER_DATE set when running hooks or exec commands so lets
update commit_tree_extended() to take an explicit committer so we
override the default date without setting GIT_COMMITTER_DATE in the
environment.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 11:58:37 -07:00
Jacob Keller 95e7c38539 refspec: make sure stack refspec_item variables are zeroed
A couple of functions that used struct refspec_item did not zero out the
structure memory. This can result in unexpected behavior, especially if
additional parameters are ever added to refspec_item in the future. Use
memset to ensure that unset structure members are zero.

It may make sense to convert most of these uses of struct refspec_item
to use either struct initializers or refspec_item_init_or_die. However,
other similar code uses memset. Converting all of these uses has been
left as a future exercise.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 10:39:21 -07:00
Jeff King f1de981e8b config: fix leaks from git_config_get_string_const()
There are two functions to get a single config string:

  - git_config_get_string()

  - git_config_get_string_const()

One might naively think that the first one allocates a new string and
the second one just points us to the internal configset storage. But
in fact they both allocate a new copy; the second one exists only to
avoid having to cast when using it with a const global which we never
intend to free.

The documentation for the function explains that clearly, but it seems
I'm not alone in being surprised by this. Of 17 calls to the function,
13 of them leak the resulting value.

We could obviously fix these by adding the appropriate free(). But it
would be simpler still if we actually had a non-allocating way to get
the string. There's git_config_get_value() but that doesn't quite do
what we want. If the config key is present but is a boolean with no
value (e.g., "[foo]bar" in the file), then we'll get NULL (whereas the
string versions will print an error and die).

So let's introduce a new variant, git_config_get_string_tmp(), that
behaves as these callers expect. We need a new name because we have new
semantics but the same function signature (so even if we converted the
four remaining callers, topics in flight might be surprised). The "tmp"
is because this value should only be held onto for a short time. In
practice it's rare for us to clear and refresh the configset,
invalidating the pointer, but hopefully the "tmp" makes callers think
about the lifetime. In each of the converted cases here the value only
needs to last within the local function or its immediate caller.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-14 10:52:04 -07:00
Jeff King c514c62a4f checkout: fix leak of non-existent branch names
We unconditionally write a branch name into a newly allocated buffer in
new_branch_info->path, via setup_branch_path(). We then check to see if
the branch exists; if not, we set that field to NULL, leaking the
memory. We should take care to free() it when doing so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-14 10:52:04 -07:00
Jeff King 9101c8ea2d submodule--helper: use strbuf_release() to free strbufs
The prepare_to_clone_next_submodule() function has a few local-variable
strbufs. We use strbuf_reset() throughout the function to reuse the
buffers over and over. But at the end of the function we also use
strbuf_reset() as they go out of scope, which means we end up leaking
their heap buffers. This should be strbuf_release() instead.

These were introduced by 48308681b0 (git submodule update: have a
dedicated helper for cloning, 2016-02-29), but it doesn't seem to have
the same mistake elsewhere. Likewise, I looked for other instances of
the pattern in the submodule--helper file but couldn't find any.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-14 10:52:04 -07:00
Junio C Hamano 4279000d3e messages: avoid SHA-1 in end-user facing messages
There are still a handful mentions of SHA-1 when we meant the
(hexadecimal) object names in end-user facing messages.  Rewrite
them.

I was hoping that this can mostly be s/SHA-1/object name/, but
a few messages needed rephrasing to keep the result readable.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-14 09:33:37 -07:00
Junio C Hamano d1a8a8979d Merge branch 'jt/has_object'
A new helper function has_object() has been introduced to make it
easier to mark object existence checks that do and don't want to
trigger lazy fetches, and a few such checks are converted using it.

* jt/has_object:
  fsck: do not lazy fetch known non-promisor object
  pack-objects: no fetch when allow-{any,promisor}
  apply: do not lazy fetch when applying binary
  sha1-file: introduce no-lazy-fetch has_object()
2020-08-13 14:13:39 -07:00
Jeff King 3e19816dc0 ls-remote: simplify UNLEAK() usage
We UNLEAK() the "sorting" list created by parsing command-line options
(which is essentially used until the program exits). But we do so right
before leaving the cmd_ls_remote() function, which means we have to hit
all of the exits. But the point of UNLEAK() is that it's an annotation
which doesn't impact the variable itself. We can mark it as soon as
we're done writing its value, and then we only have to do so once.

This gives us a minor code reduction, and serves as a better example of
how UNLEAK() can be used.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-13 11:05:26 -07:00
Jeff King a006f875e2 make git-fast-import a builtin
There's no reason that git-fast-import benefits from being a separate
binary. And as it links against libgit.a, it has a non-trivial disk
footprint. Let's make it a builtin, which reduces the size of a stripped
installation from 22MB to 21MB.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-13 11:02:13 -07:00
Jeff King d7a5649c82 make git-bugreport a builtin
There's no reason that bugreport has to be a separate binary. And since
it links against libgit.a, it has a rather large disk footprint. Let's
make it a builtin, which reduces the size of a stripped installation
from 24MB to 22MB.

This also simplifies our Makefile a bit. And we can take advantage of
builtin niceties like RUN_SETUP_GENTLY.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-13 11:02:12 -07:00
Jeff King b5dd96b70a make credential helpers builtins
There's no real reason for credential helpers to be separate binaries. I
did them this way originally under the notion that helper don't _need_
to be part of Git, and so can be built totally separately (and indeed,
the ones in contrib/credential are). But the ones in our main Makefile
build on libgit.a, and the resulting binaries are reasonably large.

We can slim down our total disk footprint by just making them builtins.
This reduces the size of:

  make strip install

from 29MB to 24MB on my Debian system.

Note that credential-cache can't operate without support for Unix
sockets. Currently we just don't build it at all when NO_UNIX_SOCKETS is
set. We could continue that with conditionals in the Makefile and our
list of builtins. But instead, let's build a dummy implementation that
dies with an informative message. That has two advantages:

  - it's simpler, because the conditional bits are all kept inside
    the credential-cache source

  - a user who is expecting it to exist will be told _why_ they can't
    use it, rather than getting the "credential-cache is not a git
    command" error which makes it look like the Git install is broken.

Note that our dummy implementation does still respond to "-h" in order
to appease t0012 (and this may be a little friendlier for users, as
well).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-13 11:02:08 -07:00
Prathamesh Chavan e83e3333b5 submodule: port submodule subcommand 'summary' from shell to C
Convert submodule subcommand 'summary' to a builtin and call it via
'git-submodule.sh'.

The shell version had to call $diff_cmd twice, once to find the modified
modules cared by the user and then again, with that list of modules
to do various operations for computing the summary of those modules.
On the other hand, the C version does not need a second call to
$diff_cmd since it reuses the module list from the first call to do the
aforementioned tasks.

In the C version, we use the combination of setting a child process'
working directory to the submodule path and then calling
'prepare_submodule_repo_env()' which also sets the 'GIT_DIR' to '.git',
so that we can be certain that those spawned processes will not access
the superproject's ODB by mistake.

A behavioural difference between the C and the shell version is that the
shell version outputs two line feeds after the 'git log' output when run
outside of the tests while the C version outputs one line feed in any
case. The reason for this is that the shell version calls log with
'--pretty=format:<fmt>' whose output is followed by two echo
calls; 'format' does not have "terminator" semantics like its 'tformat'
counterpart. So, the log output is terminated by a newline only when
invoked by the user and not when invoked from the scripts. This results
in the one & two line feed differences in the shell version.
On the other hand, the C version calls log with '--pretty=<fmt>'
which is equivalent to '--pretty:tformat:<fmt>' which is then
followed by a 'printf("\n")'. Due to its "terminator" semantics the
log output is always terminated by newline and hence one line feed in
any case.

Also, when we try to pass an option-like argument after a non-option
argument, for instance:

    git submodule summary HEAD --foo-bar

    (or)

    git submodule summary HEAD --cached

That argument would be treated like a path to the submodule for which
the user is requesting a summary. So, the option ends up having no
effect. Though, passing '--quiet' is an exception to this:

    git submodule summary HEAD --quiet

While 'summary' doesn't support '--quiet', we don't get an output for
the above command as '--quiet' is treated as a path which means we get
an output only if a submodule whose path is '--quiet' exists.

The error message in case of computing a summary for non-existent
submodules in the C version is different from that of the shell version.
Since the new error message is not marked for translation, change the
'test_i18ngrep' in t7421.4 to 'grep'.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Stefan Beller <stefanbeller@gmail.com>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-12 14:12:58 -07:00
Shourya Shukla 6414c3d316 submodule: remove extra line feeds between callback struct and macro
Many `submodule--helper` subcommands follow the convention that a struct
defines their callback data, and the declaration of that struct is
followed immediately by a macro to use in static initializers, without
any separating empty line.

Let's align the `init`, `status` and `sync` subcommands with that convention.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-12 14:12:58 -07:00
Junio C Hamano e0ad9574dd Merge branch 'bc/sha-256-part-3'
The final leg of SHA-256 transition.

* bc/sha-256-part-3: (39 commits)
  t: remove test_oid_init in tests
  docs: add documentation for extensions.objectFormat
  ci: run tests with SHA-256
  t: make SHA1 prerequisite depend on default hash
  t: allow testing different hash algorithms via environment
  t: add test_oid option to select hash algorithm
  repository: enable SHA-256 support by default
  setup: add support for reading extensions.objectformat
  bundle: add new version for use with SHA-256
  builtin/verify-pack: implement an --object-format option
  http-fetch: set up git directory before parsing pack hashes
  t0410: mark test with SHA1 prerequisite
  t5308: make test work with SHA-256
  t9700: make hash size independent
  t9500: ensure that algorithm info is preserved in config
  t9350: make hash size independent
  t9301: make hash size independent
  t9300: use $ZERO_OID instead of hard-coded object ID
  t9300: abstract away SHA-1-specific constants
  t8011: make hash size independent
  ...
2020-08-11 18:04:11 -07:00
Junio C Hamano 995c71986a Merge branch 'pb/guide-docs'
Update "git help guides" documentation organization.

* pb/guide-docs:
  git.txt: add list of guides
  Documentation: don't hardcode command categories twice
  help: drop usage of 'common' and 'useful' for guides
  command-list.txt: add missing 'gitcredentials' and 'gitremote-helpers'
2020-08-10 10:24:04 -07:00
Junio C Hamano 4339259d5f Merge branch 'en/eol-attrs-gotchas'
All "mergy" operations that internally use the merge-recursive
machinery should honor the merge.renormalize configuration, but
many of them didn't.

* en/eol-attrs-gotchas:
  checkout: support renormalization with checkout -m <paths>
  merge: make merge.renormalize work for all uses of merge machinery
  t6038: remove problematic test
  t6038: make tests fail for the right reason
2020-08-10 10:24:02 -07:00
Junio C Hamano 46b225f153 Merge branch 'jk/strvec'
The argv_array API is useful for not just managing argv but any
"vector" (NULL-terminated array) of strings, and has seen adoption
to a certain degree.  It has been renamed to "strvec" to reduce the
barrier to adoption.

* jk/strvec:
  strvec: rename struct fields
  strvec: drop argv_array compatibility layer
  strvec: update documention to avoid argv_array
  strvec: fix indentation in renamed calls
  strvec: convert remaining callers away from argv_array name
  strvec: convert more callers away from argv_array name
  strvec: convert builtin/ callers away from argv_array name
  quote: rename sq_dequote_to_argv_array to mention strvec
  strvec: rename files from argv-array to strvec
  argv-array: rename to strvec
  argv-array: use size_t for count and alloc
2020-08-10 10:23:57 -07:00
Eric Sunshine ccf236a23a init: disallow --separate-git-dir with bare repository
The purpose of "git init --separate-git-dir" is to separate the
repository from the worktree. This is true even when --separate-git-dir
is used on an existing worktree, in which case, it moves the .git/
subdirectory to a new location outside the worktree.

However, an outright bare repository (such as one created by "git init
--bare"), has no worktree, so using --separate-git-dir to separate it
from its non-existent worktree is nonsensical. Therefore, make it an
error to use --separate-git-dir on a bare repository.

Implementation note: "git init" considers a repository bare if told so
explicitly via --bare or if it guesses it to be so based upon
heuristics. In the explicit --bare case, a conflict with
--separate-git-dir is easy to detect early. In the guessed case,
however, the conflict can only be detected once "bareness" is guessed,
which happens after "git init" has begun creating the repository.
Technically, we can get by with a single late check which would cover
both cases, however, erroring out early, when possible, without leaving
detritus provides a better user experience.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-10 09:24:11 -07:00
Aaron Lipman ad464a4e84 bisect: combine args passed to find_bisection()
Now that find_bisection() accepts multiple boolean arguments, these may
be combined into a single unsigned integer in order to declutter some of
the code in bisect.c

Also, rename the existing "flags" bitfield to "commit_flags", to
explicitly differentiate it from the new "bisect_flags" bitfield.

Based-on-patch-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-07 15:13:03 -07:00
Aaron Lipman e8861ffc20 bisect: introduce first-parent flag
Upon seeing a merge commit when bisecting, this option may be used to
follow only the first parent.

In detecting regressions introduced through the merging of a branch, the
merge commit will be identified as introduction of the bug and its
ancestors will be ignored.

This option is particularly useful in avoiding false positives when a
merged branch contained broken or non-buildable commits, but the merge
itself was OK.

Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-07 15:13:03 -07:00
Aaron Lipman be5fe2000d cmd_bisect__helper: defer parsing no-checkout flag
cmd_bisect__helper() is intended as a temporary shim layer serving as an
interface for git-bisect.sh. This function and git-bisect.sh should
eventually be replaced by a C implementation, cmd_bisect(), serving as
an entrypoint for all "git bisect ..." shell commands: cmd_bisect() will
only parse the first token following "git bisect", and dispatch the
remaining args to the appropriate function ["bisect_start()",
"bisect_next()", etc.].

Thus, cmd_bisect__helper() should not be responsible for parsing flags
like --no-checkout. Instead, let the --no-checkout flag remain in the
argv array, so it may be evaluated alongside the other options already
parsed by bisect_start().

Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-07 15:13:03 -07:00
Aaron Lipman 0fe305a5d3 rev-list: allow bisect and first-parent flags
Add first_parent_only parameter to find_bisection(), removing the
barrier that prevented combining the --bisect and --first-parent flags
when using git rev-list

Based-on-patch-by: Tiago Botelho <tiagonbotelho@hotmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-07 15:11:59 -07:00
Jonathan Tan 9eb86f41de fsck: do not lazy fetch known non-promisor object
There is a call to has_object_file(), which lazily fetches missing
objects in a partial clone, when the object is known to not be
a promisor object. Change that call to has_object(), which does not do
any lazy fetching.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-06 13:01:03 -07:00
Jonathan Tan ee47243d76 pack-objects: no fetch when allow-{any,promisor}
The options --missing=allow-{any,promisor} were introduced in caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:

    This patch introduces handling of missing objects to help
    debugging and development of the "partial clone" mechanism,
    and once the mechanism is implemented, for a power user to
    perform operations that are missing-object aware without
    incurring the cost of checking if a missing link is expected.

The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as 07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).

However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-06 13:01:03 -07:00
Philippe Blain 0371a764d2 help: drop usage of 'common' and 'useful' for guides
Since 1b81d8cb19 (help: use command-list.txt for the source of guides,
2018-05-20), all man5/man7 guides listed in command-list.txt appear in
the output of 'git help -g'.

However, 'git help -g' still prefixes this list with "The common Git
guides are:", which makes one wonder if there are others!

In the same spirit, the man page for 'git help' describes the '--guides'
option as listing 'useful' guides, which is not false per se but can
also be taken to mean that there are other guides that exist but are not
useful.

Instead of 'common' and 'useful', use 'Git concept guides' in both
places. To keep the code in line with this change, rename
help.c::list_common_guides_help to list_guides_help.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-04 18:34:01 -07:00
Junio C Hamano 5c454b3825 Merge branch 'jt/pack-objects-prefetch-in-batch'
While packing many objects in a repository with a promissor remote,
lazily fetching missing objects from the promissor remote one by
one may be inefficient---the code now attempts to fetch all the
missing objects in batch (obviously this won't work for a lazy
clone that lazily fetches tree objects as you cannot even enumerate
what blobs are missing until you learn which trees are missing).

* jt/pack-objects-prefetch-in-batch:
  pack-objects: prefetch objects to be packed
  pack-objects: refactor to oid_object_info_extended
2020-08-04 13:53:57 -07:00
Elijah Newren 00906d6f22 checkout: support renormalization with checkout -m <paths>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 11:48:15 -07:00
Elijah Newren 8d552258f4 merge: make merge.renormalize work for all uses of merge machinery
The 'merge' command is not the only one that does merges; other commands
like checkout -m or rebase do as well.  Unfortunately, the only area of
the code that checked for the "merge.renormalize" config setting was in
builtin/merge.c, meaning it could only affect merges performed by the
"merge" command.  Move the handling of this config setting to
merge_recursive_config() so that other commands can benefit from it as
well.  Fixes a few tests in t6038.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 11:48:15 -07:00
Junio C Hamano add0a35caa Merge branch 'rs/grep-simpler-parse-object-or-die-call' into master
* rs/grep-simpler-parse-object-or-die-call:
  grep: avoid using oid_to_hex() with parse_object_or_die()
2020-07-30 21:34:30 -07:00
Jeff King d70a9eb611 strvec: rename struct fields
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").

Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 19:18:06 -07:00
Junio C Hamano be2dab9c80 Merge branch 'ct/mv-unmerged-path-error' into master
"git mv src dst", when src is an unmerged path, errored out
correctly but with an incorrect error message to claim that src is
not tracked, which has been clarified.

* ct/mv-unmerged-path-error:
  git-mv: improve error message for conflicted file
2020-07-30 13:20:35 -07:00
Junio C Hamano 3161cc6e6b Merge branch 'hn/reftable' into master
Preliminary clean-up of the refs API in preparation for adding a
new refs backend "reftable".

* hn/reftable:
  reflog: cleanse messages in the refs.c layer
  bisect: treat BISECT_HEAD as a pseudo ref
  t3432: use git-reflog to inspect the reflog for HEAD
  lib-t6000.sh: write tag using git-update-ref
2020-07-30 13:20:32 -07:00
Junio C Hamano f175e9b845 Merge branch 'bw/fail-cloning-into-non-empty' into master
"git clone --separate-git-dir=$elsewhere" used to stomp on the
contents of the existing directory $elsewhere, which has been
taught to fail when $elsewhere is not an empty directory.

* bw/fail-cloning-into-non-empty:
  git clone: don't clone into non-empty directory
2020-07-30 13:20:32 -07:00
Junio C Hamano 70cdbbe3a7 Merge branch 'ds/commit-graph-bloom-updates' into master
Updates to the changed-paths bloom filter.

* ds/commit-graph-bloom-updates:
  commit-graph: check all leading directories in changed path Bloom filters
  revision: empty pathspecs should not use Bloom filters
  revision.c: fix whitespace
  commit-graph: check chunk sizes after writing
  commit-graph: simplify chunk writes into loop
  commit-graph: unify the signatures of all write_graph_chunk_*() functions
  commit-graph: persist existence of changed-paths
  bloom: fix logic in get_bloom_filter()
  commit-graph: change test to die on parse, not load
  commit-graph: place bloom_settings in context
2020-07-30 13:20:31 -07:00
brian m. carlson eff45daab8 repository: enable SHA-256 support by default
Now that we have a complete SHA-256 implementation in Git, let's enable
it so people can use it.  Remove the ENABLE_SHA256 define constant
everywhere it's used.  Add tests for initializing a repository with
SHA-256.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 09:16:49 -07:00
brian m. carlson c5aecfc866 bundle: add new version for use with SHA-256
Currently we detect the hash algorithm in use by the length of the
object ID.  This is inelegant and prevents us from using a different
hash algorithm that is also 256 bits in length.

Since we cannot extend the v2 format in a backward-compatible way, let's
add a v3 format, which is identical, except for the addition of
capabilities, which are prefixed by an at sign.  We add "object-format"
as the only capability and reject unknown capabilities, since we do not
have a network connection and therefore cannot negotiate with the other
side.

For compatibility, default to the v2 format for SHA-1 and require v3
for SHA-256.

In t5510, always use format v3 so we can be sure we produce consistent
results across hash algorithms.  Since head -n N lists the top N lines
instead of the Nth line, let's run our output through sed to normalize
it and compare it against a fixed value, which will make sure we get
exactly what we're expecting.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 09:16:48 -07:00
brian m. carlson e74b606d47 builtin/verify-pack: implement an --object-format option
A recently added test in t5702 started using git verify-pack outside of
a repository.  While this poses no problems with SHA-1, with SHA-256 we
implicitly rely on the setup of the repository to initialize our hash
algorithm settings.

Since we're not in a repository here, we need to provide git verify-pack
help to set things up properly.  git index-pack already knows an
--object-format option, so let's accept one as well and pass it down to
our git index-pack invocation.  Since we're now dynamically adjusting
the elements in argv, let's switch to using struct argv_array to manage
them.  Finally, let's make t5702 pass the proper argument on down to its
git verify-pack caller.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 09:16:48 -07:00
Jeff King 9ab89a2439 log: enable "-m" automatically with "--first-parent"
When using "--first-parent" to consider history as a single line of
commits, git-log still defaults to treating merges specially, even
though they could be considered as single commits in the linearized
history (that just introduce all of the changes from the second and
higher parents).

Let's instead have "--first-parent" imply "-m", which makes something
like:

  git log --first-parent -p

do what you'd expect. Likewise:

  git log --first-parent -Sfoo

will find "foo" in merge commits.

No new test is needed; we'll tweak the output of the existing
"--first-parent -p" test, which now matches the "-m --first-parent -p"
test. The unchanged existing test for "--no-diff-merges" confirms that
the user can get the old behavior if they want.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-29 13:43:57 -07:00
Jeff King 6fae74b418 revision: add "--no-diff-merges" option to counteract "-m"
The "-m" option sets revs->ignore_merges to "0", but there's no way to
undo it. This probably isn't something anybody overly cares about, since
"1" is already the default, but it will serve as an escape hatch when we
flip the default for ignore_merges to "0" in more situations.

We'll also add a few extra niceties:

  - initialize the value to "-1" to indicate "not set", and then resolve
    it to the normal 0/1 bool in setup_revisions(). This lets any tweak
    functions, as well as setup_revisions() itself, avoid clobbering the
    user's preference (which until now they couldn't actually express).

  - since we now have --no-diff-merges, let's add the matching
    --diff-merges, which is just a synonym for "-m". Then we don't even
    need to document --no-diff-merges separately; it countermands the
    long form of "-m" in the usual way.

The new test shows that this behaves just the same as the current
behavior without "-m".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-29 13:43:57 -07:00
Jeff King eed5332a13 log: drop "--cc implies -m" logic
This was added by 82dee4160c (log: show merge commit when --cc is given,
2015-08-20), which explains why we need it. But that commit failed to
notice that setup_revisions() already does the same thing, since
cd2bdc5309 (Common option parsing for "git log --diff" and friends,
2006-04-14).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-29 13:43:57 -07:00
René Scharfe 98c6871fad grep: avoid using oid_to_hex() with parse_object_or_die()
parse_object_or_die() is passed an object ID and a name to show if the
object cannot be parsed.  If the name is NULL then it shows the
hexadecimal object ID.  Use that feature instead of preparing and
passing the hexadecimal representation to the function proactively.
That's shorter and a bit more efficient.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:26:12 -07:00
Jeff King f6d8942b1f strvec: fix indentation in renamed calls
Code which split an argv_array call across multiple lines, like:

  argv_array_pushl(&args, "one argument",
                   "another argument", "and more",
		   NULL);

was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:

  strvec_pushl(&args, "one argument",
                   "another argument", "and more",
		   NULL);

Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:

  git jump grep 'strvec_.*,$'

and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King 22f9b7f3f5 strvec: convert builtin/ callers away from argv_array name
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).

This patch converts all of the files in builtin/ to keep the diff to a
manageable size.

The conversion was done purely mechanically with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe '
    s/ARGV_ARRAY/STRVEC/g;
    s/argv_array/strvec/g;
  '

and then selectively staging files with "git add builtin/". We'll deal
with any indentation/style fallouts separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King 2745b6b450 quote: rename sq_dequote_to_argv_array to mention strvec
We want to eventually drop the use of the "argv_array" name in favor of
"strvec." Unlike most other uses of the name, this one is embedded in a
function name, so the definition and all of the callers need to be
updated at the same time.

We don't technically need to update the parameter types here (our
preprocessor compat macros make the two names interchangeable), but
let's do so to keep the site consistent for now.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King dbbcd44fb4 strvec: rename files from argv-array to strvec
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe 's/argv-array.h/strvec.h/'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:17 -07:00
Jonathan Tan e00549aa9b pack-objects: prefetch objects to be packed
When an object to be packed is noticed to be missing, prefetch all
to-be-packed objects in one batch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-21 14:29:42 -07:00
Jonathan Tan 8d5cf95735 pack-objects: refactor to oid_object_info_extended
Use oid_object_info_extended() instead of oid_object_info() because a
subsequent commit needs to specify an additional flag here.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-21 14:29:42 -07:00
Chris Torek 9b906af657 git-mv: improve error message for conflicted file
'git mv' has always complained about renaming a conflicted
file, as it cannot handle multiple index entries for one file.
However, the error message it uses has been the same as the
one for an untracked file:

    fatal: not under version control, src=...

which is patently wrong.  Distinguish the two cases and
add a test to make sure we produce the correct message.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-20 14:35:43 -07:00
Junio C Hamano d1ae8ba096 Merge branch 'tb/commit-graph-no-check-oids' into master
Fix to the code to produce progress bar, which is new in the
upcoming release.

* tb/commit-graph-no-check-oids:
  commit-graph: fix "Collecting commits from input" progress line
2020-07-15 16:29:45 -07:00
SZEDER Gábor 862aead24e commit-graph: fix "Collecting commits from input" progress line
To display a progress line while reading commits from standard input
and looking them up, 5b6653e523 (builtin/commit-graph.c: dereference
tags in builtin, 2020-05-13) should have added a pair of
start_delayed_progress() and stop_progress() calls around the loop
reading stdin.  Alas, the stop_progress() call ended up at the wrong
place, after write_commit_graph(), which does all the commit-graph
computation and writing, and has several progress lines of its own.
Consequently, that new

  Collecting commits from input: 1234

progress line is overwritten by the first progress line shown by
write_commit_graph(), and its final "done" line is shown last, after
everything is finished:

  $ { sleep 3 ; git rev-list -3 HEAD ; sleep 1 ; } | ~/src/git/git commit-graph write --stdin-commits
  Expanding reachable commits in commit graph: 873402, done.
  Writing out commit graph in 4 passes: 100% (3493608/3493608), done.
  Collecting commits from input: 3, done.

Furthermore, that stop_progress() call was added after the 'cleanup'
label, where that loop reading stdin jumps in case of an error.  In
case of invalid input this then results in the "done" line shown after
the error message:

  $ { sleep 3 ; git rev-list -3 HEAD ; echo junk ; } | ~/src/git/git commit-graph write --stdin-commits
  error: unexpected non-hex object ID: junk
  Collecting commits from input: 3, done.

Move that stop_progress() call to the right place.

While at it, drop the unnecessary 'if (progress)' condition protecting
the stop_progress() call, because that function is prepared to handle
a NULL progress struct.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-15 11:57:19 -07:00
Rohit Ashiwal ef484add9f rebase -i: add --ignore-whitespace flag
Rebase is implemented with two different backends - 'apply' and
'merge' each of which support a different set of options. In
particular the apply backend supports a number of options implemented
by 'git am' that are not implemented in the merge backend. This means
that the available options are different depending on which backend is
used which is confusing. This patch adds support for the
--ignore-whitespace option to the merge backend. This option treats
lines with only whitespace changes as unchanged and is implemented in
the merge backend by translating it to -Xignore-space-change.

Signed-off-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-13 07:55:37 -07:00
Han-Wen Nienhuys de966e39a8 bisect: treat BISECT_HEAD as a pseudo ref
Both the git-bisect.sh as bisect--helper inspected the file system
directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 13:53:37 -07:00
Ben Wijen dfaa209a79 git clone: don't clone into non-empty directory
When using git clone with --separate-git-dir realgitdir and
realgitdir already exists, it's content is destroyed.

So, make sure we don't clone into an existing non-empty directory.

When d45420c1 (clone: do not clean up directories we didn't create,
2018-01-02) tightened the clean-up procedure after a failed cloning
into an empty directory, it assumed that the existing directory
given is an empty one so it is OK to keep that directory, while
running the clean-up procedure that is designed to remove everything
in it (since there won't be any, anyway).  Check and make sure that
the $GIT_DIR is empty even cloning into an existing repository.

Signed-off-by: Ben Wijen <ben@wijen.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 11:43:29 -07:00
Junio C Hamano 46be023084 Merge branch 'ct/diff-with-merge-base-clarification' into master
Recent update to "git diff" meant as a code clean-up introduced a
bug in its error handling code, which has been corrected.

* ct/diff-with-merge-base-clarification:
  diff: check for merge bases before assigning sym->base
2020-07-09 14:00:43 -07:00
Junio C Hamano 8251695fe7 Merge branch 'cc/cat-file-usage-update' into master
Doc/usage update.

* cc/cat-file-usage-update:
  cat-file: add missing [=<format>] to usage/synopsis
2020-07-09 14:00:41 -07:00
Jeff King 5f46e610cb diff: check for merge bases before assigning sym->base
In symdiff_prepare(), we iterate over the set of parsed objects to pick
out any symmetric differences, including the left, right, and base
elements. We assign the results into pointers in a "struct symdiff", and
then complain if we didn't find a base, like so:

    sym->left = rev->pending.objects[lpos].name;
    sym->right = rev->pending.objects[rpos].name;
    sym->base = rev->pending.objects[basepos].name;
    if (basecount == 0)
            die(_("%s...%s: no merge base"), sym->left, sym->right);

But the least lines are backwards. If basecount is 0, then basepos will
be -1, and we will access memory outside of the pending array. This
isn't usually that big a deal, since we don't do anything besides a
single pointer-sized read before exiting anyway, but it does violate the
C standard, and of course memory-checking tools like ASan complain.

Let's put the basecount check first. Note that we haveto split it from
the other assignments, since the die() relies on sym->left and
sym->right having been assigned (this isn't strictly necessary, but is
easier to read than dereferencing the pending array again).

Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-08 13:57:18 -07:00
Junio C Hamano 0a23331aa6 Merge branch 'jk/fast-export-anonym-alt'
"git fast-export --anonymize" learned to take customized mapping to
allow its users to tweak its output more usable for debugging.

* jk/fast-export-anonym-alt:
  fast-export: use local array to store anonymized oid
  fast-export: anonymize "master" refname
  fast-export: allow seeding the anonymized mapping
  fast-export: add a "data" callback parameter to anonymize_str()
  fast-export: move global "idents" anonymize hashmap into function
  fast-export: use a flex array to store anonymized entries
  fast-export: stop storing lengths in anonymized hashmaps
  fast-export: tighten anonymize_mem() interface to handle only strings
  fast-export: store anonymized oids as hex strings
  fast-export: use xmemdupz() for anonymizing oids
  t9351: derive anonymized tree checks from original repo
2020-07-06 22:09:17 -07:00
Junio C Hamano 11cbda2add Merge branch 'js/default-branch-name'
The name of the primary branch in existing repositories, and the
default name used for the first branch in newly created
repositories, is made configurable, so that we can eventually wean
ourselves off of the hardcoded 'master'.

* js/default-branch-name:
  contrib: subtree: adjust test to change in fmt-merge-msg
  testsvn: respect `init.defaultBranch`
  remote: use the configured default branch name when appropriate
  clone: use configured default branch name when appropriate
  init: allow setting the default for the initial branch name via the config
  init: allow specifying the initial branch name for the new repository
  docs: add missing diamond brackets
  submodule: fall back to remote's HEAD for missing remote.<name>.branch
  send-pack/transport-helper: avoid mentioning a particular branch
  fmt-merge-msg: stop treating `master` specially
2020-07-06 22:09:17 -07:00
Junio C Hamano 0258ed1e08 Merge branch 'cb/is-descendant-of'
Code clean-up.

* cb/is-descendant-of:
  commit-reach: avoid is_descendant_of() shim
2020-07-06 22:09:16 -07:00
Junio C Hamano 645f63111b Merge branch 'es/get-worktrees-unsort'
API cleanup for get_worktrees()

* es/get-worktrees-unsort:
  worktree: drop get_worktrees() unused 'flags' argument
  worktree: drop get_worktrees() special-purpose sorting option
2020-07-06 22:09:15 -07:00
Junio C Hamano d80bea479d Merge branch 'ak/commit-graph-to-slab'
A few fields in "struct commit" that do not have to always be
present have been moved to commit slabs.

* ak/commit-graph-to-slab:
  commit-graph: minimize commit_graph_data_slab access
  commit: move members graph_pos, generation to a slab
  commit-graph: introduce commit_graph_data_slab
  object: drop parsed_object_pool->commit_count
2020-07-06 22:09:14 -07:00
Junio C Hamano 12210859da Merge branch 'bc/sha-256-part-2'
SHA-256 migration work continues.

* bc/sha-256-part-2: (44 commits)
  remote-testgit: adapt for object-format
  bundle: detect hash algorithm when reading refs
  t5300: pass --object-format to git index-pack
  t5704: send object-format capability with SHA-256
  t5703: use object-format serve option
  t5702: offer an object-format capability in the test
  t/helper: initialize the repository for test-sha1-array
  remote-curl: avoid truncating refs with ls-remote
  t1050: pass algorithm to index-pack when outside repo
  builtin/index-pack: add option to specify hash algorithm
  remote-curl: detect algorithm for dumb HTTP by size
  builtin/ls-remote: initialize repository based on fetch
  t5500: make hash independent
  serve: advertise object-format capability for protocol v2
  connect: parse v2 refs with correct hash algorithm
  connect: pass full packet reader when parsing v2 refs
  Documentation/technical: document object-format for protocol v2
  t1302: expect repo format version 1 for SHA-256
  builtin/show-index: provide options to determine hash algo
  t5302: modernize test formatting
  ...
2020-07-06 22:09:13 -07:00
Christian Couder 0172f7834a cat-file: add missing [=<format>] to usage/synopsis
When displaying cat-file usage, the fact that a <format> can
be specified is only visible when lookling at the --batch and
--batch-check options which are shown like this:

    --batch[=<format>]    show info and content of objects fed from the standard input
    --batch-check[=<format>]
                          show info about objects fed from the standard input

It seems more coherent and improves discovery to also show it
on the usage line.

In the documentation the DESCRIPTION tells us that "The output
format can be overridden using the optional <format> argument",
but we can't see the <format> argument in the SYNOPSIS above
the description which is confusing.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 15:54:05 -07:00
Derrick Stolee 0087a87ba8 commit-graph: persist existence of changed-paths
The changed-path Bloom filters were released in v2.27.0, but have a
significant drawback. A user can opt-in to writing the changed-path
filters using the "--changed-paths" option to "git commit-graph write"
but the next write will drop the filters unless that option is
specified.

This becomes even more important when considering the interaction with
gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of
features.experimental). These config options trigger commit-graph writes
that the user did not signal, and hence there is no --changed-paths
option available.

Allow a user that opts-in to the changed-path filters to persist the
property of "my commit-graph has changed-path filters" automatically. A
user can drop filters using the --no-changed-paths option.

In the process, we need to be extremely careful to match the Bloom
filter settings as specified by the commit-graph. This will allow future
versions of Git to customize these settings, and the version with this
change will persist those settings as commit-graphs are rewritten on
top.

Use the trace2 API to signal the settings used during the write, and
check that output in a test after manually adjusting the correct bytes
in the commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 14:17:43 -07:00
Junio C Hamano 298d704e70 Merge branch 'sk/diff-files-show-i-t-a-as-new'
"git diff-files" has been taught to say paths that are marked as
intent-to-add are new files, not modified from an empty blob.

* sk/diff-files-show-i-t-a-as-new:
  diff-files: treat "i-t-a" files as "not-in-index"
2020-06-29 14:17:27 -07:00
Junio C Hamano b381c98891 Merge branch 'rs/pull-leakfix'
Leakfix.

* rs/pull-leakfix:
  pull: plug minor memory leak after using is_descendant_of()
2020-06-29 14:17:26 -07:00
Junio C Hamano 1ea1f93fd9 Merge branch 'dl/diff-usage-comment-update'
An in-code comment in "git diff" has been updated.

* dl/diff-usage-comment-update:
  builtin/diff: fix botched update of usage comment
  builtin/diff: update usage comment
2020-06-29 14:17:25 -07:00
Junio C Hamano 1033b98291 Merge branch 'xl/upgrade-repo-format'
Allow runtime upgrade of the repository format version, which needs
to be done carefully.

There is a rather unpleasant backward compatibility worry with the
last step of this series, but it is the right thing to do in the
longer term.

* xl/upgrade-repo-format:
  check_repository_format_gently(): refuse extensions for old repositories
  sparse-checkout: upgrade repository to version 1 when enabling extension
  fetch: allow adding a filter after initial clone
  repository: add a helper function to perform repository format upgrade
2020-06-29 14:17:24 -07:00
Jeff King f39ad38410 fast-export: use local array to store anonymized oid
Some older versions of gcc complain about this line:

  builtin/fast-export.c:412:2: error: dereferencing type-punned pointer
       will break strict-aliasing rules [-Werror=strict-aliasing]
    put_be32(oid.hash + hashsz - 4, counter++);
    ^

This seems to be a false positive, as there's no type-punning at all
here. oid.hash is an array of unsigned char; when we pass it to a
function it decays to a pointer to unsigned char. We do take a void
pointer in put_be32(), but it's immediately aliased with another pointer
to unsigned char (and clearly the compiler is looking inside the inlined
put_be32(), since the warning doesn't happen with -O0).

This happens on gcc 4.8 and 4.9, but not later versions (I tested gcc 6,
7, 8, and 9).

We can work around it by using a local array instead of an object_id
struct. This is a little more intimate with the details of object_id,
but for whatever reason doesn't seem to trigger the compiler warning.
We can revert this patch once we decide that those gcc versions are too
old to care about for a warning like this (gcc 4.8 is the default
compiler for Ubuntu Trusty, which is out-of-support but not fully
end-of-life'd until April 2022).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25 14:19:23 -07:00
Jeff King 8a49495583 fast-export: anonymize "master" refname
Running "fast-export --anonymize" will leave "refs/heads/master"
untouched in the output, for two reasons:

  - it helped to have some known reference point between the original
    and anonymized repository

  - since it's historically the default branch name, it doesn't leak any
    information

Now that we can ask fast-export to retain particular tokens, we have a
much better tool for the first one (because it works for any ref, not
just master).

For the second, the notion of "default branch name" is likely to become
configurable soon, at which point the name _does_ leak information.
Let's drop this special case in preparation.

Note that we have to adjust the test a bit, since it relied on using the
name "master" in the anonymized repos. We could just use
--anonymize-map=master to keep the same output, but then we wouldn't
know if it works because of our hard-coded master or because of the
explicit map.

So let's flip the test a bit, and confirm that we anonymize "master",
but keep "other" in the output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25 14:19:23 -07:00
Jeff King 65b5d9fae7 fast-export: allow seeding the anonymized mapping
After you anonymize a repository, it can be hard to find which commits
correspond between the original and the result, and thus hard to
reproduce commands that triggered bugs in the original.

Let's make it possible to seed the anonymization map. This lets users
either:

  - mark names to be retained as-is, if they don't consider them secret
    (in which case their original commands would just work)

  - map names to new values, which lets them adapt the reproduction
    recipe to the new names without revealing the originals

The implementation is fairly straight-forward. We already store each
anonymized token in a hashmap (so that the same token appearing twice is
converted to the same result). We can just introduce a new "seed"
hashmap which is consulted first.

This does make a few more promises to the user about how we'll anonymize
things (e.g., token-splitting pathnames). But it's unlikely that we'd
want to change those rules, even if the actual anonymization of a single
token changes. And it makes things much easier for the user, who can
unblind only a directory name without having to specify each path within
it.

One alternative to this approach would be to anonymize as we see fit,
and then dump the whole refname and pathname mappings to a file. This
does work, but it's a bit awkward to use (you have to manually dig the
items you care about out of the mapping).

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25 14:19:23 -07:00
Junio C Hamano 34e849b05a Merge branch 'jt/cdn-offload'
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.

* jt/cdn-offload:
  upload-pack: fix a sparse '0 as NULL pointer' warning
  upload-pack: send part of packfile response as uri
  fetch-pack: support more than one pack lockfile
  upload-pack: refactor reading of pack-objects out
  Documentation: add Packfile URIs design doc
  Documentation: order protocol v2 sections
  http-fetch: support fetching packfiles by URL
  http-fetch: refactor into function
  http: refactor finish_http_pack_request()
  http: use --stdin when indexing dumb HTTP pack
2020-06-25 12:27:47 -07:00
Junio C Hamano 10462829e3 Merge branch 'ss/submodule-set-branch-in-c'
Rewrite of parts of the scripted "git submodule" Porcelain command
continues; this time it is "git submodule set-branch" subcommand's
turn.

* ss/submodule-set-branch-in-c:
  submodule: port subcommand 'set-branch' from shell to C
2020-06-25 12:27:47 -07:00
Junio C Hamano 7b2685ef2d Merge branch 'dl/branch-cleanup'
Code clean-up around "git branch" with a minor bugfix.

* dl/branch-cleanup:
  branch: don't mix --edit-description
  t3200: test for specific errors
  t3200: rename "expected" to "expect"
2020-06-25 12:27:47 -07:00
Junio C Hamano 1457886ce2 Merge branch 'ct/diff-with-merge-base-clarification'
"git diff" used to take arguments in random and nonsense range
notation, e.g. "git diff A..B C", "git diff A..B C...D", etc.,
which has been cleaned up.

* ct/diff-with-merge-base-clarification:
  Documentation: usage for diff combined commits
  git diff: improve range handling
  t/t3430: avoid undefined git diff behavior
2020-06-25 12:27:46 -07:00
Junio C Hamano 53674699c0 Merge branch 'en/clean-cleanups'
Code clean-up of "git clean" resulted in a fix of recent
performance regression.

* en/clean-cleanups:
  clean: optimize and document cases where we recurse into subdirectories
  clean: consolidate handling of ignored parameters
  dir, clean: avoid disallowed behavior
  dir: fix a few confusing comments
2020-06-25 12:27:45 -07:00
Johannes Schindelin 0cc1b475bb clone: use configured default branch name when appropriate
When cloning a repository without any branches, Git chooses a default
branch name for the as-yet unborn branch.

As part of the implicit initialization of the local repository, Git just
learned to respect `init.defaultBranch` to choose a different initial
branch name. We now really want that branch name to be used as a
fall-back.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Don Goodman-Wilson 8747ebb7cd init: allow setting the default for the initial branch name via the config
We just introduced the command-line option
`--initial-branch=<branch-name>` to allow initializing a new repository
with a different initial branch than the hard-coded one.

To allow users to override the initial branch name more permanently
(i.e. without having to specify the name manually for each and every
`git init` invocation), let's introduce the `init.defaultBranch` config
setting.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Don Goodman-Wilson <don@goodman-wilson.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Johannes Schindelin 32ba12dab2 init: allow specifying the initial branch name for the new repository
There is a growing number of projects and companies desiring to change
the main branch name of their repositories (see e.g.
https://twitter.com/mislav/status/1270388510684598272 for background on
this).

To change that branch name for new repositories, currently the only way
to do that automatically is by copying all of Git's template directory,
then hard-coding the desired default branch name into the `.git/HEAD`
file, and then configuring `init.templateDir` to point to those copied
template files.

To make this process much less cumbersome, let's introduce a new option:
`--initial-branch=<branch-name>`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Johannes Schindelin f0a96e8d4c submodule: fall back to remote's HEAD for missing remote.<name>.branch
When `remote.<name>.branch` is not configured, `git submodule update`
currently falls back to using the branch name `master`. A much better
idea, however, is to use the remote `HEAD`: on all Git servers running
reasonably recent Git versions, the symref `HEAD` points to the main
branch.

Note: t7419 demonstrates that there _might_ be use cases out there that
_expect_ `git submodule update --remote` to update submodules to the
remote `master` branch even if the remote `HEAD` points to another
branch. Arguably, this patch makes the behavior more intuitive, but
there is a slight possibility that this might cause regressions in
obscure setups.

Even so, it should be okay to fix this behavior without anything like a
longer transition period:

- The `git submodule update --remote` command is not really common.

- Current Git's behavior when running this command is outright
  confusing, unless the remote repository's current branch _is_ `master`
  (in which case the proposed behavior matches the old behavior).

- If a user encounters a regression due to the changed behavior, the fix
  is actually trivial: setting `submodule.<name>.branch` to `master`
  will reinstate the old behavior.

Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Jeff King d5bf91fde4 fast-export: add a "data" callback parameter to anonymize_str()
The anonymize_str() function takes a generator callback, but there's no
way to pass extra context to it. Let's add the usual "void *data"
parameter to the generator interface and pass it along.

This is mildly annoying for existing callers, all of which pass NULL,
but is necessary to avoid extra globals in some cases we'll add in a
subsequent patch.

While we're touching each of these callbacks, we can further observe
that none of them use the existing orig/len parameters at all. This
makes sense, since the point is for their output to have no discernable
basis in the original (my original version had some notion that we might
use a one-way function to obfuscate the names, but it was never
implemented). So let's drop those extra parameters. If a caller really
wants to do something with them, it can pass a struct through the new
data parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
Jeff King 6416a865da fast-export: move global "idents" anonymize hashmap into function
All of the other anonymization functions keep their static mappings
inside the function to avoid polluting the global namespace. Let's do
the same for "idents", as nobody needs it outside of
anonymize_ident_line().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
Jeff King 55b01456a9 fast-export: use a flex array to store anonymized entries
Now that we're using a separate keydata struct for hash lookups, we have
more flexibility in how we allocate anonymized_entry structs. Let's push
the "orig" key into a flex member within the struct. That should save us
a few bytes of memory per entry (a pointer plus any malloc overhead),
and may make lookups a little faster (since it's one less pointer to
chase in the comparison function).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
Jeff King a0f65641df fast-export: stop storing lengths in anonymized hashmaps
Now that the anonymize_str() interface is restricted to NUL-terminated
strings, there's no need for us to keep track of the length of each
entry in the hashmap. This simplifies the code and saves a bit of
memory.

Note that we do still need to compare the stored results to partial
strings passed in by the callers. We can do that by using hashmap's
keydata feature to get the ptr/len pair into the comparison function,
and then using strncmp().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
Jeff King 7f40759496 fast-export: tighten anonymize_mem() interface to handle only strings
While the anonymize_mem() interface _can_ store arbitrary byte
sequences, none of the callers uses this feature (as of the previous
commit). We'd like to keep it that way, as we'll be exposing the
string-like nature of the anonymization routines to the user. So let's
tighten up the interface a bit:

  - don't treat "len" as an out-parameter from anonymize_mem(); this
    ensures callers treat the pointer result as a NUL-terminated string

  - likewise, don't treat "len" as an out-parameter from generator
    functions

  - swap out "void *" for "char *" as appropriate to signal that we
    don't handle arbitrary memory

  - rename the function to anonymize_str()

This will also open up some optimization opportunities in a future
patch.

Note that we can't drop the "len" parameter entirely. Some callers do
pass in partial strings (e.g., "foo/bar", len=3) to avoid copying, and
we need to handle those still.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
Jeff King 750bb32589 fast-export: store anonymized oids as hex strings
When fast-export stores anonymized oids, it does so as binary strings.
And while the anonymous mapping storage is binary-clean (at least as of
the previous commit), this will become awkward when we start exposing
more of it to the user. In particular, if we allow a method for
retaining token "foo", then users may want to specify a hex oid as such
a token.

Let's just switch to storing the hex strings. The difference in memory
usage is negligible (especially considering how infrequently we'd
generally store an oid compared to, say, path components).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
Jeff King b897bf5f37 fast-export: use xmemdupz() for anonymizing oids
Our anonymize_mem() function is careful to take a ptr/len pair to allow
storing binary tokens like object ids, as well as partial strings (e.g.,
just "foo" of "foo/bar"). But it duplicates the hash key using
xstrdup()! That means that:

  - for a partial string, we'd store all bytes up to the NUL, even
    though we'd never look at anything past "len". This didn't produce
    wrong behavior, but was wasteful.

  - for a binary oid that doesn't contain a zero byte, we'd copy garbage
    bytes off the end of the array (though as long as nothing complained
    about reading uninitialized bytes, further reads would be limited by
    "len", and we'd produce the correct results)

  - for a binary oid that does contain a zero byte, we'd copy _fewer_
    bytes than intended into the hashmap struct. When we later try to
    look up a value, we'd access uninitialized memory and potentially
    falsely claim that a particular oid is not present.

The most common reason to store an oid is an anonymized gitlink, but our
test case doesn't have any gitlinks at all. So let's add one whose oid
contains a NUL and is present at two different paths. ASan catches the
memory error, but even without it we can detect the bug because the oid
is not anonymized the same way for both paths.

And of course the fix is to copy the correct number of bytes. We don't
technically need the appended NUL from xmemdupz(), but it doesn't hurt
as an extra protection against anybody treating it like a string (plus a
future patch will push us more in that direction).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
Denton Liu c592fd4c83 builtin/diff: fix botched update of usage comment
In the previous commit, an attempt was made to correct the "N=1, M=0"
case. However, the fix was botched and it introduced two half-correct
sections by mistake. Combine these half-correct sections into one fully
correct section.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 16:39:41 -07:00
Carlo Marcelo Arenas Belón c1ea625f72 commit-reach: avoid is_descendant_of() shim
d91d6fbf26 (commit-reach: create repo_is_descendant_of(), 2020-06-17)
adds a repository aware version of is_descendant_of() and a backward
compatibility shim that is barely used.

Update all callers to directly use the new repo_is_descendant_of()
function instead; making the codebase simpler and pushing more
the_repository references higher up the stack.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 16:36:53 -07:00
Junio C Hamano 9740ef888e Merge branch 'es/worktree-duplicate-paths'
The same worktree directory must be registered only once, but
"git worktree move" allowed this invariant to be violated, which
has been corrected.

* es/worktree-duplicate-paths:
  worktree: make "move" refuse to move atop missing registered worktree
  worktree: generalize candidate worktree path validation
  worktree: prune linked worktree referencing main worktree path
  worktree: prune duplicate entries referencing same worktree path
  worktree: make high-level pruning re-usable
  worktree: give "should be pruned?" function more meaningful name
  worktree: factor out repeated string literal
2020-06-22 15:55:03 -07:00
Srinidhi Kaushik feea6946a5 diff-files: treat "i-t-a" files as "not-in-index"
The `diff-files' command and related commands which call the function
`cmd_diff_files()', consider the "intent-to-add" files as a part of the
index when comparing the work-tree against it. This was previously
addressed in commits [1] and [2] by turning the option
`--ita-invisible-in-index' (introduced in [3]) on by default.

For `diff-files' (and `add -p' as a consequence) to show the i-t-a
files as as new, `ita_invisible_in_index' will be enabled by default
here as well.

[1] 0231ae71d3 (diff: turn --ita-invisible-in-index on by default,
                2018-05-26)
[2] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist
                in index", 2016-10-24)
[3] b42b451919 (diff: add --ita-[in]visible-in-index, 2016-10-24)

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 10:46:45 -07:00
Eric Sunshine 03f2465bb1 worktree: drop get_worktrees() unused 'flags' argument
get_worktrees() accepts a 'flags' argument, however, there are no
existing flags (the lone flag GWT_SORT_LINKED was recently retired) and
no behavior which can be tweaked. Therefore, drop the 'flags' argument.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 10:31:15 -07:00
Eric Sunshine d9c54c2bbf worktree: drop get_worktrees() special-purpose sorting option
Of all the clients of get_worktrees(), only "git worktree list" wants
the list sorted in a very specific way; other clients simply don't care
about the order. Rather than imbuing get_worktrees() with special
knowledge about how various clients -- now and in the future -- may want
the list sorted, drop the sorting capability altogether and make it the
client's responsibility to sort the list if needed.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 10:30:29 -07:00
brian m. carlson 586740aa6e builtin/index-pack: add option to specify hash algorithm
git index-pack is usually run in a repository, but need not be. Since
packs don't contains information on the algorithm in use, instead
relying on context, add an option to index-pack to tell it which one
we're using in case someone runs it outside of a repository.  Since
using --stdin necessarily implies a repository, don't allow specifying
an object format if it's provided to prevent users from passing an
option that won't work.  Add documentation for this option.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:08 -07:00
René Scharfe 0c9a4f638a pull: plug minor memory leak after using is_descendant_of()
cmd_pull() builds a commit_list to pass a single potential ancestor to
is_descendant_of().  The latter leaves the list intact.  Release the
allocated memory after the call.

Leaking in cmd_*() isn't a big deal, but sets a bad example for other
users of is_descendant_of().

Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 12:17:21 -07:00
Denton Liu a9d7689cd4 builtin/diff: update usage comment
A comment in cmd_diff() states that if one tree-ish and no blobs are
provided, (the "N=1, M=0" case), it will provide a diff between the tree
and the cache. This is incorrect because a diff happens between the
tree-ish and the working tree. Remove the `--cached` in the comment so
that the correct behavior is shown. Add a new section describing the
"N=1, M=0, --cached" behavior.

Next, describe the "N=0, M=0, --cached" case, similar to the above since
it is undocumented.

Finally, fix some spacing issues. Add spaces between each section for
consistency and readability. Also, change tabs within the comment into
spaces.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-18 15:01:15 -07:00
Junio C Hamano a554228ffb Merge branch 'en/sparse-checkout'
The behaviour of "sparse-checkout" in the state "git clone
--no-checkout" left was changed accidentally in 2.27, which has
been corrected.

* en/sparse-checkout:
  sparse-checkout: avoid staging deletions of all files
2020-06-17 21:54:02 -07:00
Junio C Hamano 524caf8035 Merge branch 'js/reflog-anonymize-for-clone-and-fetch'
The reflog entries for "git clone" and "git fetch" did not
anonymize the URL they operated on.

* js/reflog-anonymize-for-clone-and-fetch:
  clone/fetch: anonymize URLs in the reflog
2020-06-17 21:54:01 -07:00
Abhishek Kumar 6da43d937c object: drop parsed_object_pool->commit_count
14ba97f8 (alloc: allow arbitrary repositories for alloc functions,
2018-05-15) introduced parsed_object_pool->commit_count to keep count of
commits per repository and was used to assign commit->index.

However, commit-slab code requires commit->index values to be unique
and a global count would be correct, rather than a per-repo count.

Let's introduce a static counter variable, `parsed_commits_count` to
keep track of parsed commits so far.

As commit_count has no use anymore, let's also drop it from the struct.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17 14:37:14 -07:00
Denton Liu dc44639904 branch: don't mix --edit-description
`git branch` accepts `--edit-description` in conjunction with other
arguments. However, `--edit-description` is its own mode, similar to
`--set-upstream-to`, which is also made mutually exclusive with other
modes. Prevent `--edit-description` from being mixed with other modes.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17 11:12:34 -07:00
Elijah Newren 7233f17577 clean: optimize and document cases where we recurse into subdirectories
Commit 6b1db43109 ("clean: teach clean -d to preserve ignored paths",
2017-05-23) added the following code block (among others) to git-clean:
    if (remove_directories)
        dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
The reason for these flags is well documented in the commit message, but
isn't obvious just from looking at the code.  Add some explanations to
the code to make it clearer.

Further, it appears git-2.26 did not correctly handle this combination
of flags from git-clean.  With both these flags and without
DIR_SHOW_IGNORED_TOO_MODE_MATCHING set, git is supposed to recurse into
all untracked AND ignored directories.  git-2.26.0 clearly was not doing
that.  I don't know the full reasons for that or whether git < 2.27.0
had additional unknown bugs because of that misbehavior, because I don't
feel it's worth digging into.  As per the huge changes and craziness
documented in commit 8d92fb2927 ("dir: replace exponential algorithm
with a linear one", 2020-04-01), the old algorithm was a mess and was
thrown out.  What I can say is that git-2.27.0 correctly recurses into
untracked AND ignored directories with that combination.

However, in clean's case we don't need to recurse into ignored
directories; that is just a waste of time.  Thus, when git-2.27.0
started correctly handling those flags, we got a performance regression
report.  Rather than relying on other bugs in fill_directory()'s former
logic to provide the behavior of skipping ignored directories, make use
of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in
commit eec0f7f2b7 ("status: add option to show ignored files
differently", 2017-10-30) for this purpose.

Reported-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 17:27:16 -07:00
Elijah Newren f7f5c6c0ba clean: consolidate handling of ignored parameters
I spent a long time trying to figure out how and whether the code worked
with different values of ignore, ignore_only, and remove_directories.
After lots of time setting up lots of testcases, sifting through lots of
print statements, and walking through the debugger, I finally realized
that one piece of code related to how it was all setup was found in
clean.c rather than dir.c.  Make a change that would have made it easier
for me to do the extra testing by putting this handling in one spot.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 17:27:16 -07:00
Elijah Newren 351ea1c3cb dir, clean: avoid disallowed behavior
dir.h documented quite clearly that DIR_SHOW_IGNORED and
DIR_SHOW_IGNORED_TOO are mutually exclusive, with a big comment to this
effect by the definition of both enum values.  However, a command like
   git clean -fx $DIR
would set both values for dir.flags.  I _think_ it happened to work
because:
  * As dir.h points out, DIR_KEEP_UNTRACKED_CONTENTS only takes effect
    if DIR_SHOW_IGNORED_TOO is set.
  * As coded, I believe DIR_SHOW_IGNORED would just happen to take
    precedence over DIR_SHOW_IGNORED_TOO in the code as currently
    constructed.
Which is a long way of saying "we just got lucky".

Fix clean.c to avoid setting these mutually exclusive values at the same
time, and add a check to dir.c that will throw a BUG() to prevent anyone
else from making this mistake.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 17:27:16 -07:00
Chris Torek b7e10b2ca2 Documentation: usage for diff combined commits
Document the usage for producing combined commits with "git diff".
This includes updating the synopsis section.

While here, add the three-dot notation to the synopsis.

Make "git diff -h" print the same usage summary as the manual
page synopsis, minus the "A..B" form, which is now discouraged.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 10:53:44 -07:00
Chris Torek 8bfcb3a690 git diff: improve range handling
When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).

This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.

Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:

 * If there is a symmetric difference merge base, this is used
   as the left side of the diff.  The last final ref is used as
   the right side.
 * If there is no merge base, the symmetric status is completely
   lost.  We will produce a combined diff instead.

Similar weirdness occurs if you use, e.g., "git diff C A...B D".
Likewise, using multiple two-dot ranges, or tossing extra
revision specifiers into the command line with two-dot ranges,
or mixing two and three dot ranges, all produce nonsense.

To avoid all this, add a routine to catch the range cases and
verify that that the arguments make sense.  As a side effect,
produce a warning showing *which* merge base is being used when
there are multiple choices; die if there is no merge base.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 10:53:44 -07:00
Jonathan Tan dd4b732df7 upload-pack: send part of packfile response as uri
Teach upload-pack to send part of its packfile response as URIs.

An administrator may configure a repository with one or more
"uploadpack.blobpackfileuri" lines, each line containing an OID, a pack
hash, and a URI. A client may configure fetch.uriprotocols to be a
comma-separated list of protocols that it is willing to use to fetch
additional packfiles - this list will be sent to the server. Whenever an
object with one of those OIDs would appear in the packfile transmitted
by upload-pack, the server may exclude that object, and instead send the
URI. The client will then download the packs referred to by those URIs
before performing the connectivity check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 18:06:34 -07:00
Jonathan Tan 9da69a6539 fetch-pack: support more than one pack lockfile
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.

In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.

Implementation notes:

 - builtin/fetch-pack.c normally does not generate .keep files, and thus
   is unaffected by this or future changes. However, it has an
   undocumented "--lock-pack" feature, used by remote-curl.c when
   implementing the "fetch" remote helper command. In keeping with the
   remote helper protocol, only one "lock" line will ever be written;
   the rest will result in warnings to stderr. However, in practice,
   warnings will never be written because the remote-curl.c "fetch" is
   only used for protocol v0/v1 (which will not generate multiple .keep
   files). (Protocol v2 uses the "stateless-connect" command, not the
   "fetch" command.)

 - connected.c has an optimization in that connectivity checks on a ref
   need not be done if the target object is in a pack known to be
   self-contained and connected. If there are multiple packfiles, this
   optimization can no longer be done.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 18:06:34 -07:00
Eric Sunshine 810382ed37 worktree: make "move" refuse to move atop missing registered worktree
"git worktree add" takes special care to avoid creating a new worktree
at a location already registered to an existing worktree even if that
worktree is missing (which can happen, for instance, if the worktree
resides on removable media). "git worktree move", however, is not so
careful when validating the destination location and will happily move
the source worktree atop the location of a missing worktree. This leads
to the anomalous situation of multiple worktrees being associated with
the same path, which is expressly forbidden by design. For example:

    $ git clone foo.git
    $ cd foo
    $ git worktree add ../bar
    $ git worktree add ../baz
    $ rm -rf ../bar
    $ git worktree move ../baz ../bar
    $ git worktree list
    .../foo beefd00f [master]
    .../bar beefd00f [bar]
    .../bar beefd00f [baz]
    $ git worktree remove ../bar
    fatal: validation failed, cannot remove working tree:
        '.../bar' does not point back to '.git/worktrees/bar'

Fix this shortcoming by enhancing "git worktree move" to perform the
same additional validation of the destination directory as done by "git
worktree add".

While at it, add a test to verify that "git worktree move" won't move a
worktree atop an existing (non-worktree) path -- a restriction which has
always been in place but was never tested.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
Eric Sunshine d179af679b worktree: generalize candidate worktree path validation
"git worktree add" checks that the specified path is a valid location
for a new worktree by ensuring that the path does not already exist and
is not already registered to another worktree (a path can be registered
but missing, for instance, if it resides on removable media). Since "git
worktree add" is not the only command which should perform such
validation ("git worktree move" ought to also), generalize the the
validation function for use by other callers, as well.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
Eric Sunshine 916133ef8e worktree: prune linked worktree referencing main worktree path
"git worktree prune" detects when multiple entries are associated with
the same path and prunes the duplicates, however, it does not detect
when a linked worktree points at the path of the main worktree.
Although "git worktree add" disallows creating a new worktree with the
same path as the main worktree, such a case can arise outside the
control of Git even without the user mucking with .git/worktree/<id>/
administrative files. For instance:

    $ git clone foo.git
    $ git -C foo worktree add ../bar
    $ rm -rf bar
    $ mv foo bar
    $ git -C bar worktree list
    .../bar deadfeeb [master]
    .../bar deadfeeb [bar]

Help the user recover from such corruption by extending "git worktree
prune" to also detect when a linked worktree is associated with the path
of the main worktree.

Reported-by: Jonathan Müller <jonathanmueller.dev@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
Eric Sunshine 4a3ce479ce worktree: prune duplicate entries referencing same worktree path
A fundamental restriction of linked working trees is that there must
only ever be a single worktree associated with a particular path, thus
"git worktree add" explicitly disallows creation of a new worktree at
the same location as an existing registered worktree. Nevertheless,
users can still "shoot themselves in the foot" by mucking with
administrative files in .git/worktree/<id>/. Worse, "git worktree move"
is careless[1] and allows a worktree to be moved atop a registered but
missing worktree (which can happen, for instance, if the worktree is on
removable media). For instance:

    $ git clone foo.git
    $ cd foo
    $ git worktree add ../bar
    $ git worktree add ../baz
    $ rm -rf ../bar
    $ git worktree move ../baz ../bar
    $ git worktree list
    .../foo beefd00f [master]
    .../bar beefd00f [bar]
    .../bar beefd00f [baz]

Help users recover from this form of corruption by teaching "git
worktree prune" to detect when multiple worktrees are associated with
the same path.

[1]: A subsequent commit will fix "git worktree move" validation to be
     more strict.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
Eric Sunshine dd9609a12e worktree: make high-level pruning re-usable
The low-level logic for removing a worktree is well encapsulated in
delete_git_dir(). However, high-level details related to pruning a
worktree -- such as dealing with verbosity and dry-run mode -- are not
encapsulated. Factor out this high-level logic into its own function so
it can be re-used as new worktree corruption detectors are added.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
Eric Sunshine 1b14d40b38 worktree: give "should be pruned?" function more meaningful name
Readers of the name prune_worktree() are likely to expect the function
to actually prune a worktree, however, it only answers the question
"should this worktree be pruned?". Give it a name more reflective of its
true purpose to avoid such confusion.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
Junio C Hamano 63e50b8678 Merge branch 'cb/bisect-helper-parser-fix'
The code to parse "git bisect start" command line was lax in
validating the arguments.

* cb/bisect-helper-parser-fix:
  bisect--helper: avoid segfault with bad syntax in `start --term-*`
2020-06-08 18:06:32 -07:00
Junio C Hamano b37fd14beb Merge branch 'dl/remote-curl-deadlock-fix'
On-the-wire protocol v2 easily falls into a deadlock between the
remote-curl helper and the fetch-pack process when the server side
prematurely throws an error and disconnects.  The communication has
been updated to make it more robust.

* dl/remote-curl-deadlock-fix:
  stateless-connect: send response end packet
  pkt-line: define PACKET_READ_RESPONSE_END
  remote-curl: error on incomplete packet
  pkt-line: extern packet_length()
  transport: extract common fetch_pack() call
  remote-curl: remove label indentation
  remote-curl: fix typo
2020-06-08 18:06:30 -07:00
Junio C Hamano ded44afa02 Merge branch 'bc/filter-process'
Code simplification and test coverage enhancement.

* bc/filter-process:
  t2060: add a test for switch with --orphan and --discard-changes
  builtin/checkout: simplify metadata initialization
2020-06-08 18:06:30 -07:00
Junio C Hamano dc57a9be5e Merge branch 'tb/commit-graph-no-check-oids'
Clean-up the commit-graph codepath.

* tb/commit-graph-no-check-oids:
  commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  t5318: reorder test below 'graph_read_expect'
  commit-graph.c: simplify 'fill_oids_from_commits'
  builtin/commit-graph.c: dereference tags in builtin
  builtin/commit-graph.c: extract 'read_one_commit()'
  commit-graph.c: peel refs in 'add_ref_to_set'
  commit-graph.c: show progress of finding reachable commits
  commit-graph.c: extract 'refs_cb_data'
2020-06-08 18:06:27 -07:00
Eric Sunshine c9b77f2cea worktree: factor out repeated string literal
For each worktree removed by "git worktree prune", it reports the reason
for the removal. All reasons share the common prefix "Removing
worktrees/%s:". As new removal reasons are added, this prefix needs to
be duplicated, which is error-prone and potentially cumbersome.
Therefore, factor out the common prefix.

Although this change seems to increase the "sentence lego quotient", it
should be reasonably safe, as the reason for removal is a distinct
clause, not strictly related to the prefix. Moreover, the "worktrees" in
"Removing worktrees/%s:" is a path literal which ought not be localized,
so by factoring it out, we can more easily avoid exposing that path
fragment to translators.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08 13:31:27 -07:00
Xin Li 98564d8059 sparse-checkout: upgrade repository to version 1 when enabling extension
The 'extensions' configuration variable gets special meaning in the new
repository version, so when enabling the extension we should upgrade the
repository to version 1.

Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 10:13:30 -07:00
Xin Li 01bbbbd9da fetch: allow adding a filter after initial clone
Retroactively adding a filter can be useful for existing shallow clones as
they allow users to see earlier change histories without downloading all
git objects in a regular --unshallow fetch.

Without this patch, users can make a clone partial by editing the
repository configuration to convert the remote into a promisor, like:

  git config core.repositoryFormatVersion 1
  git config extensions.partialClone origin
  git fetch --unshallow --filter=blob:none origin

Since the hard part of making this work is already in place and such
edits can be error-prone, teach Git to perform the required configuration
change automatically instead.

Note that this change does not modify the existing git behavior which
recognizes setting extensions.partialClone without changing
repositoryFormatVersion.

Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 10:13:30 -07:00
Elijah Newren b5bfc08a97 sparse-checkout: avoid staging deletions of all files
sparse-checkout's purpose is to update the working tree to have it
reflect a subset of the tracked files.  As such, it shouldn't be
switching branches, making commits, downloading or uploading data, or
staging or unstaging changes.  Other than updating the worktree, the
only thing sparse-checkout should touch is the SKIP_WORKTREE bit of the
index.  In particular, this sets up a nice invariant: running
sparse-checkout will never change the status of any file in `git status`
(reflecting the fact that we only set the SKIP_WORKTREE bit if the file
is safe to delete, i.e. if the file is unmodified).

Traditionally, we did a _really_ bad job with this goal.  The
predecessor to sparse-checkout involved manual editing of
.git/info/sparse-checkout and running `git read-tree -mu HEAD`.  That
command would stage and unstage changes and overwrite dirty changes in
the working tree.

The initial implementation of the sparse-checkout command was no better;
it simply invoked `git read-tree -mu HEAD` as a subprocess and had the
same caveats, though this issue came up repeatedly in review comments
and workarounds for the problems were put in place before the feature
was merged[1, 2, 3, 4, 5, 6; especially see 4 & 6].

[1] https://lore.kernel.org/git/CABPp-BFT9A5n=_bx5LsjCvbogqwSjiwgr5amcjgbU1iAk4KLJg@mail.gmail.com/
[2] https://lore.kernel.org/git/CABPp-BEmwSwg4tgJg6nVG8a3Hpn_g-=ZjApZF4EiJO+qVgu4uw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFV7TA0qwZCQpHCqx9N+JifyRyuBQ-pZ_oGfe-NOgyh7A@mail.gmail.com/
[4] https://lore.kernel.org/git/CABPp-BHYCCD+Vx5fq35jH82eHc1-P53Lz_aGNpHJNcx9kg2K-A@mail.gmail.com/
[5] https://lore.kernel.org/git/CABPp-BF+JWYZfDqp2Tn4AEKVp4b0YMA=Mbz4Nz62D-gGgiduYQ@mail.gmail.com/
[6] https://lore.kernel.org/git/20191121163706.GV23183@szeder.dev/

However, these workarounds, in addition to disabling the feature in a
number of important cases, also missed one special case.  I'll get back
to it later.

In the 2.27.0 cycle, the disabling of the feature was lifted by finally
replacing the internal equivalent of `git read-tree -mu HEAD` with
something that did what we wanted: the new update_sparsity() function in
unpack-trees.c that only ever updates SKIP_WORKTREE bits in the index
and updates the working tree to match.  This new function handles all
the cases that were problematic for the old implementation, except that
it breaks the same special case that avoided the workarounds of the old
implementation, but broke it in a different way.

So...that brings us to the special case: a git clone performed with
--no-checkout.  As per the meaning of the flag, --no-checkout does not
check out any branch, with the implication that you aren't on one and
need to switch to one after the clone.  Implementationally, HEAD is
still set (so in some sense you are partially on a branch), but
  * the index is "unborn" (non-existent)
  * there are no files in the working tree (other than .git/)
  * the next time git switch (or git checkout) is run it will run
    unpack_trees with `initial_checkout` flag set to true.
It is not until you run, e.g. `git switch <somebranch>` that the index
will be written and files in the working tree populated.

With this special --no-checkout case, the traditional `read-tree -mu
HEAD` behavior would have done the equivalent of acting like checkout --
switch to the default branch (HEAD), write out an index that matches
HEAD, and update the working tree to match.  This special case slipped
through the avoid-making-changes checks in the original sparse-checkout
command and thus continued there.

After update_sparsity() was introduced and used (see commit f56f31af03
("sparse-checkout: use new update_sparsity() function", 2020-03-27)),
the behavior for the --no-checkout case changed:  Due to git's
auto-vivification of an empty in-memory index (see do_read_index() and
note that `must_exist` is false), and due to sparse-checkout's
update_working_directory() code to always write out the index after it
was done, we got a new bug.  That made it so that sparse-checkout would
switch the repository from a clone with an "unborn" index (i.e. still
needing an initial_checkout), to one that had a recorded index with no
entries.  Thus, instead of all the files appearing deleted in `git
status` being known to git as a special artifact of not yet being on a
branch, our recording of an empty index made it suddenly look to git as
though it was definitely on a branch with ALL files staged for deletion!
A subsequent checkout or switch then had to contend with the fact that
it wasn't on an initial_checkout but had a bunch of staged deletions.

Make sure that sparse-checkout changes nothing in the index other than
the SKIP_WORKTREE bit; in particular, when the index is unborn we do not
have any branch checked out so there is no sparsification or
de-sparsification work to do.  Simply return from
update_working_directory() early.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 08:05:50 -07:00
Johannes Schindelin 46da295a77 clone/fetch: anonymize URLs in the reflog
Even if we strongly discourage putting credentials into the URLs passed
via the command-line, there _is_ support for that, and users _do_ do
that.

Let's scrub them before writing them to the reflog.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 13:20:21 -07:00
Junio C Hamano de82fb45db Merge branch 'rs/checkout-b-track-error'
The error message from "git checkout -b foo -t bar baz" was
confusing.

* rs/checkout-b-track-error:
  checkout: improve error messages for -b with extra argument
  checkout: add tests for -b and --track
2020-06-02 13:35:04 -07:00
Junio C Hamano 0739479c6a Merge branch 'an/merge-single-strategy-optim'
Code optimization for a common case.

* an/merge-single-strategy-optim:
  merge: optimization to skip evaluate_result for single strategy
2020-06-02 13:35:01 -07:00
Shourya Shukla 2964d6e5e1 submodule: port subcommand 'set-branch' from shell to C
Convert submodule subcommand 'set-branch' to a builtin and call it via
'git-submodule.sh'.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Denton Liu <liu.denton@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-02 10:51:54 -07:00
brian m. carlson d96dab868e builtin/ls-remote: initialize repository based on fetch
ls-remote may or may not operate within a repository, and as such will
not have been initialized with the repository's hash algorithm.  Even if
it were, the remote side could be using a different algorithm and we
would still want to display those refs properly.  Find the hash
algorithm used by the remote side by querying the transport object and
set our hash algorithm accordingly.

Without this change, if the remote side is using SHA-256, we truncate
the refs to 40 hex characters, since that's the length of the default
hash algorithm (SHA-1).

Note that technically this is not a correct setting of the repository
hash algorithm since, if we are in a repository, it might be one of a
different hash algorithm from the remote side.  However, our current
code paths don't handle multiple algorithms and won't for some time, so
this is the best we can do.  We rely on the fact that ls-remote never
modifies the current repository, which is a reasonable assumption to
make.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27 10:07:07 -07:00
brian m. carlson 88a09a557c builtin/show-index: provide options to determine hash algo
show-index is capable of reading any possible index file whether or not
the index is inside a repository.  However, because our index files lack
metadata about the hash algorithm in use, it's not possible to
autodetect the algorithm that a particular index file is using.

In order to allow us to read index files of any algorithm, let's set up
the .git directory gently so that we default to the algorithm for the
current repository, and add an --object-format option to allow users to
override this setting and continue to run show-index outside of a
repository altogether.  Let's also document this new option so that
people can find it and use it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27 10:07:07 -07:00
brian m. carlson 629dffc461 packfile: compute and use the index CRC offset
Both v2 pack index files and the v3 format specified as part of the
NewHash work have similar data starting at the CRC table.  Much of the
existing code wants to read either this table or the offset entries
following it, and in doing so computes the offset each time.

In order to share as much code between v2 and v3, compute the offset of
the CRC table and store it when the pack is opened.  Use this value to
compute offsets to not only the CRC table, but to the offset entries
beyond it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27 10:07:07 -07:00
brian m. carlson b65dc2cebd builtin/clone: initialize hash algorithm properly
When performing a clone, we don't know what hash algorithm the other end
will support.  Currently, we don't support fetching data belonging to a
different algorithm, so we must know what algorithm the remote side is
using in order to properly initialize the repository.  We can know that
only after fetching the refs, so if the remote side has any references,
use that information to reinitialize the repository with the correct
hash algorithm information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27 10:07:06 -07:00
brian m. carlson bb095d0875 builtin/receive-pack: detect when the server doesn't support our hash
Detect when the server doesn't support our hash algorithm and abort.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27 10:07:06 -07:00
brian m. carlson bf30dbf826 remote: advertise the object-format capability on the server side
Advertise the current hash algorithm in use by using the object-format
capability as part of the ref advertisement.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27 10:07:06 -07:00
Junio C Hamano d55a4ae71d Merge branch 'ds/multi-pack-verify'
Fix for a copy-and-paste error introduced during 2.20 era.

* ds/multi-pack-verify:
  fsck: use ERROR_MULTI_PACK_INDEX
2020-05-24 19:39:39 -07:00
Denton Liu b0df0c16ea stateless-connect: send response end packet
Currently, remote-curl acts as a proxy and blindly forwards packets
between an HTTP server and fetch-pack. In the case of a stateless RPC
connection where the connection is terminated before the transaction is
complete, remote-curl will blindly forward the packets before waiting on
more input from fetch-pack. Meanwhile, fetch-pack will read the
transaction and continue reading, expecting more input to continue the
transaction. This results in a deadlock between the two processes.

This can be seen in the following command which does not terminate:

	$ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012
	Cloning into 'git'...

whereas the v1 version does terminate as expected:

	$ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012
	Cloning into 'git'...
	fatal: the remote end hung up unexpectedly

Instead of blindly forwarding packets, make remote-curl insert a
response end packet after proxying the responses from the remote server
when using stateless_connect(). On the RPC client side, ensure that each
response ends as described.

A separate control packet is chosen because we need to be able to
differentiate between what the remote server sends and remote-curl's
control packets. By ensuring in the remote-curl code that a server
cannot send response end packets, we prevent a malicious server from
being able to perform a denial of service attack in which they spoof a
response end packet and cause the described deadlock to happen.

Reported-by: Force Charlie <charlieio@outlook.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24 16:26:00 -07:00
René Scharfe bb2198fb91 checkout: improve error messages for -b with extra argument
When we try to create a branch "foo" based on "origin/master" and give
git commit -b an extra unsupported argument "bar", it confusingly
reports:

   $ git checkout -b foo origin/master bar
   fatal: 'bar' is not a commit and a branch 'foo' cannot be created from it

   $ git checkout --track -b foo origin/master bar
   fatal: 'bar' is not a commit and a branch 'foo' cannot be created from it

That's wrong, because it very well understands that "origin/master" is
supposed to be the start point for the new branch and not "bar".  Check
if we got a commit and show more fitting messages in that case instead:

   $ git checkout -b foo origin/master bar
   fatal: Cannot update paths and switch to branch 'foo' at the same time.

   $ git checkout --track -b foo origin/master bar
   fatal: '--track' cannot be used with updating paths

Original-patch-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24 16:21:30 -07:00
Carlo Marcelo Arenas Belón 4d9005ff5d bisect--helper: avoid segfault with bad syntax in start --term-*
06f5608c14 (bisect--helper: `bisect_start` shell function partially in C,
2019-01-02) adds a lax parser for `git bisect start` which could result
in a segfault under a bad syntax call for start with custom terms.

Detect if there are enough arguments left in the command line to use for
--term-{old,good,new,bad} and abort with the same syntax error the original
implementation will show if not.

While at it, remove an unnecessary (and incomplete) check for unknown
arguments and make sure to add a test to avoid regressions.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24 09:00:11 -07:00
brian m. carlson 81861288a9 builtin/checkout: simplify metadata initialization
When we call init_checkout_metadata in reset_tree, we want to pass the
object ID of the commit in question so that it can be passed to filters,
or if there is no commit, the tree.  We anticipated this latter case,
which can occur elsewhere in the checkout code, but it cannot occur
here.  The only case in which we do not have a commit object is when
invoking git switch with --orphan.  Moreover, we can only hit this code
path without a commit object additionally with either --force or
--discard-changes.

In such a case, there is no point initializing the checkout metadata
with a commit or tree because (a) there is no commit, only the empty
tree, and (b) we will never use the data, since no files will be smudged
when checking out a branch with no files.  Pass the all-zeros object ID
in this case, since we just need some value which is a valid pointer.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-21 09:55:21 -07:00
Derrick Stolee e68a5272b1 fsck: use ERROR_MULTI_PACK_INDEX
The multi-pack-index was added to the data verified by git-fsck in
ea5ae6c3 "fsck: verify multi-pack-index". This implementation was
based on the implementation for verifying the commit-graph, and a
copy-paste error kept the ERROR_COMMIT_GRAPH flag as the bit set
when an error appears in the multi-pack-index.

Add a new flag, ERROR_MULTI_PACK_INDEX, and use that instead.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-19 16:13:22 -07:00
Andrew Ng 8777616e4d merge: optimization to skip evaluate_result for single strategy
For a merge with a single strategy, the result of evaluate_result() is
effectively not used and therefore is not needed, so avoid altogether.

On Windows, this optimization can halve the time required to perform a
recursive merge of a single commit with the LLVM repo.

Signed-off-by: Andrew Ng <andrew.ng@sony.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-19 15:35:46 -07:00
Taylor Blau 2f00c355cb commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.

This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.

Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.

If callers do wish to retain this behavior, they can easily work around
this change by doing the following:

     git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
     awk '
       !/commit/ { print "not-a-commit:"$1 }
        /commit/ { print $1 }
     ' |
     git commit-graph write --stdin-commits

To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:51:11 -07:00
Taylor Blau 5b6653e523 builtin/commit-graph.c: dereference tags in builtin
When given a list of commits, the commit-graph machinery calls
'lookup_commit_reference_gently()' on each element in the set and treats
the resulting set of OIDs as the base over which to close for
reachability.

In an earlier collection of commits, the 'git commit-graph write
--reachable' case made the inner-most call to
'lookup_commit_reference_gently()' by peeling references before they
were passed over to the commit-graph internals.

Do the analog for 'git commit-graph write --stdin-commits' by calling
'lookup_commit_reference_gently()' outside of the commit-graph
machinery, making the inner-most call a noop.

Since this may incur additional processing time, surround
'read_one_commit' with a progress meter to provide output to the caller.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:51:11 -07:00
Taylor Blau fa8953cb40 builtin/commit-graph.c: extract 'read_one_commit()'
With either '--stdin-commits' or '--stdin-packs', the commit-graph
builtin will read line-delimited input, and interpret it either as a
series of commit OIDs, or pack names.

In a subsequent commit, we will begin handling '--stdin-commits'
differently by processing each line as it comes in, instead of in one
shot at the end. To make adequate room for this additional logic, split
the '--stdin-commits' case from '--stdin-packs' by only storing the
input when '--stdin-packs' is given.

In the case of '--stdin-commits', feed each line to a new
'read_one_commit' helper, which (for now) will merely call
'parse_oid_hex'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-18 12:50:07 -07:00
Junio C Hamano 9e8ed173b4 Merge branch 'ss/submodule-set-url-in-c'
Rewriting various parts of "git submodule" in C continues.

* ss/submodule-set-url-in-c:
  submodule: port subcommand 'set-url' from shell to C
2020-05-13 12:19:20 -07:00
Junio C Hamano 3af459e48d Merge branch 'jc/auto-gc-quiet'
Teach "am", "commit", "merge" and "rebase", when they are run with
the "--quiet" option, to pass "--quiet" down to "gc --auto".

* jc/auto-gc-quiet:
  auto-gc: pass --quiet down from am, commit, merge and rebase
  auto-gc: extract a reusable helper from "git fetch"
2020-05-13 12:19:19 -07:00
Junio C Hamano 896833b268 Merge branch 'tb/shallow-cleanup'
Code cleanup.

* tb/shallow-cleanup:
  shallow: use struct 'shallow_lock' for additional safety
  shallow.h: document '{commit,rollback}_shallow_file'
  shallow: extract a header file for shallow-related functions
  commit: make 'commit_graft_pos' non-static
2020-05-13 12:19:18 -07:00
Junio C Hamano b9bcd76a9a Merge branch 'cb/avoid-colliding-with-netbsd-hmac'
The <stdlib.h> header on NetBSD brings in its own definition of
hmac() function (eek), which conflicts with our own and unrelated
function with the same name.  Our function has been renamed to work
around the issue.

* cb/avoid-colliding-with-netbsd-hmac:
  builtin/receive-pack: avoid generic function name hmac()
2020-05-08 14:25:09 -07:00
Junio C Hamano 4c2941a5fa Merge branch 'es/restore-staged-from-head-by-default'
"git restore --staged --worktree" now defaults to take the contents
out of "HEAD", instead of erring out.

* es/restore-staged-from-head-by-default:
  restore: default to HEAD when combining --staged and --worktree
2020-05-08 14:25:08 -07:00
Junio C Hamano 6de1630898 Merge branch 'jk/for-each-ref-multi-key-sort-fix'
"git branch" and other "for-each-ref" variants accepted multiple
--sort=<key> options in the increasing order of precedence, but it
had a few breakages around "--ignore-case" handling, and tie-breaking
with the refname, which have been fixed.

* jk/for-each-ref-multi-key-sort-fix:
  ref-filter: apply fallback refname sort only after all user sorts
  ref-filter: apply --ignore-case to all sorting keys
2020-05-08 14:25:04 -07:00
Junio C Hamano f4675f3d47 Merge branch 'dl/switch-c-option-in-error-message'
In error messages that "git switch" mentions its option to create a
new branch, "-b/-B" options were shown, where "-c/-C" options
should be, which has been corrected.

* dl/switch-c-option-in-error-message:
  switch: fix errors and comments related to -c and -C
2020-05-08 14:25:00 -07:00
Shourya Shukla 6417cf9c21 submodule: port subcommand 'set-url' from shell to C
Convert submodule subcommand 'set-url' to a builtin. Port 'set-url' to
'submodule--helper.c' and call the latter via 'git-submodule.sh'.

Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-08 09:17:55 -07:00
Junio C Hamano 7c3e9e8cfb auto-gc: pass --quiet down from am, commit, merge and rebase
These commands take the --quiet option for their own operation, but
they forget to pass the option down when they invoke "git gc --auto"
internally.

Teach them to do so using the run_auto_gc() helper we added in the
previous step.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-07 12:24:35 -07:00
Junio C Hamano 850b6edefa auto-gc: extract a reusable helper from "git fetch"
Back in 1991006c (fetch: convert argv_gc_auto to struct argv_array,
2014-08-16), we taught "git fetch --quiet" to pass the "--quiet"
option down to "gc --auto".  This issue, however, is not limited to
"fetch":

    $ git grep -e 'gc.*--auto' \*.c

finds hits in "am", "commit", "merge", and "rebase" and these
commands do not pass "--quiet" down to "gc --auto" when they
themselves are told to be quiet.

As a preparatory step, let's introduce a helper function
run_auto_gc(), that the caller can pass a boolean "quiet",
and redo the fix to "git fetch" using the helper.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-07 12:24:33 -07:00
Junio C Hamano b75dc16ae3 Merge branch 'dl/push-recurse-submodules-fix'
Code cleanup.

* dl/push-recurse-submodules-fix:
  push: unset PARSE_OPT_OPTARG for --recurse-submodules
2020-05-05 14:54:28 -07:00
Junio C Hamano 6652716200 Merge branch 'dl/opt-callback-cleanup'
Code cleanup.

* dl/opt-callback-cleanup:
  Use OPT_CALLBACK and OPT_CALLBACK_F
2020-05-05 14:54:27 -07:00
Eric Sunshine 088018e34d restore: default to HEAD when combining --staged and --worktree
By default, files are restored from the index for --worktree, and from
HEAD for --staged. When --worktree and --staged are combined, --source
must be specified to disambiguate the restore source[1], thus making it
cumbersome to restore a file in both the worktree and the index.

However, HEAD is also a reasonable default for --worktree when combined
with --staged, so make it the default anytime --staged is used (whether
combined with --worktree or not).

[1]: Due to an oversight, the --source requirement, though documented,
is not actually enforced.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-05 11:27:38 -07:00
Carlo Marcelo Arenas Belón 3013118eb8 builtin/receive-pack: avoid generic function name hmac()
fabec2c5c3 (builtin/receive-pack: switch to use the_hash_algo, 2019-08-18)
renames hmac_sha1 to hmac, as it was updated to use the hash function used
by git (which won't be sha1 in the future).

hmac() is provided by NetBSD >= 8 libc and therefore conflicts as shown by :

builtin/receive-pack.c:421:13: error: conflicting types for 'hmac'
 static void hmac(unsigned char *out,
             ^~~~
In file included from ./git-compat-util.h:172:0,
                 from ./builtin.h:4,
                 from builtin/receive-pack.c:1:
/usr/include/stdlib.h:305:10: note: previous declaration of 'hmac' was here
 ssize_t  hmac(const char *, const void *, size_t, const void *, size_t, void *,
          ^~~~

Rename it again to hmac_hash to reflect it will use the git's defined hash
function and avoid the conflict, while at it update a comment to better
describe the HMAC function that was used.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-05 11:26:25 -07:00
Jeff King 76f9e569ad ref-filter: apply --ignore-case to all sorting keys
All of the ref-filter users (for-each-ref, branch, and tag) take an
--ignore-case option which makes filtering and sorting case-insensitive.
However, this option was applied only to the first element of the
ref_sorting list. So:

  git for-each-ref --ignore-case --sort=refname

would do what you expect, but:

  git for-each-ref --ignore-case --sort=refname --sort=taggername

would sort the primary key (taggername) case-insensitively, but sort the
refname case-sensitively. We have two options here:

  - teach callers to set ignore_case on the whole list

  - replace the ref_sorting list with a struct that contains both the
    list of sorting keys, as well as options that apply to _all_
    keys

I went with the first one here, as it gives more flexibility if we later
want to let the users set the flag per-key (presumably through some
special syntax when defining the key; for now it's all or nothing
through --ignore-case).

The new test covers this by sorting on both tagger and subject
case-insensitively, which should compare "a" and "A" identically, but
still sort them before "b" and "B". We'll break ties by sorting on the
refname to give ourselves a stable output (this is actually supposed to
be done automatically, but there's another bug which will be fixed in
the next commit).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-04 13:41:20 -07:00
Junio C Hamano 2c42fb7653 Merge branch 'js/anonymise-push-url-in-errors'
Error and verbose trace messages from "git push" did not redact
credential material embedded in URLs.

* js/anonymise-push-url-in-errors:
  push: anonymize URLs in error messages and warnings
2020-05-01 13:39:59 -07:00
Junio C Hamano dd094c2b75 Merge branch 'es/bugreport'
The "bugreport" tool.

* es/bugreport:
  bugreport: drop extraneous includes
  bugreport: add compiler info
  bugreport: add uname info
  bugreport: gather git version and build info
  bugreport: add tool to generate debugging info
  help: move list_config_help to builtin/help
2020-05-01 13:39:59 -07:00
Junio C Hamano 6d6b412da3 Merge branch 'en/rebase-root-and-fork-point-are-incompatible'
Incompatible options "--root" and "--fork-point" of "git rebase"
have been marked and documented as being incompatible.

* en/rebase-root-and-fork-point-are-incompatible:
  rebase: display an error if --root and --fork-point are both provided
2020-05-01 13:39:58 -07:00
Junio C Hamano 6d56d4c7dc Merge branch 'ds/blame-on-bloom'
"git blame" learns to take advantage of the "changed-paths" Bloom
filter stored in the commit-graph file.

* ds/blame-on-bloom:
  test-bloom: check that we have expected arguments
  test-bloom: fix some whitespace issues
  blame: drop unused parameter from maybe_changed_path
  blame: use changed-path Bloom filters
  tests: write commit-graph with Bloom filters
  revision: complicated pathspecs disable filters
2020-05-01 13:39:54 -07:00
Junio C Hamano 9b6606f43d Merge branch 'gs/commit-graph-path-filter'
Introduce an extension to the commit-graph to make it efficient to
check for the paths that were modified at each commit using Bloom
filters.

* gs/commit-graph-path-filter:
  bloom: ignore renames when computing changed paths
  commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag
  t4216: add end to end tests for git log with Bloom filters
  revision.c: add trace2 stats around Bloom filter usage
  revision.c: use Bloom filters to speed up path based revision walks
  commit-graph: add --changed-paths option to write subcommand
  commit-graph: reuse existing Bloom filters during write
  commit-graph: write Bloom filters to commit graph file
  commit-graph: examine commits by generation number
  commit-graph: examine changed-path objects in pack order
  commit-graph: compute Bloom filters for changed paths
  diff: halt tree-diff early after max_changes
  bloom.c: core Bloom filter implementation for changed paths.
  bloom.c: introduce core Bloom filter constructs
  bloom.c: add the murmur3 hash implementation
  commit-graph: define and use MAX_NUM_CHUNKS
2020-05-01 13:39:53 -07:00
Junio C Hamano 6a1c17d05b Merge branch 'tb/commit-graph-split-strategy'
"git commit-graph write" learned different ways to write out split
files.

* tb/commit-graph-split-strategy:
  Revert "commit-graph.c: introduce '--[no-]check-oids'"
  commit-graph.c: introduce '--[no-]check-oids'
  commit-graph.h: replace 'commit_hex' with 'commits'
  oidset: introduce 'oidset_size'
  builtin/commit-graph.c: introduce split strategy 'replace'
  builtin/commit-graph.c: introduce split strategy 'no-merge'
  builtin/commit-graph.c: support for '--split[=<strategy>]'
  t/helper/test-read-graph.c: support commit-graph chains
2020-05-01 13:39:52 -07:00
Junio C Hamano 2b4ff3d3dc Merge branch 'tb/reset-shallow'
Fix in-core inconsistency after fetching into a shallow repository
that broke the code to write out commit-graph.

* tb/reset-shallow:
  shallow.c: use '{commit,rollback}_shallow_file'
  t5537: use test_write_lines and indented heredocs for readability
2020-05-01 13:39:51 -07:00
Taylor Blau cac4b8e22e shallow: use struct 'shallow_lock' for additional safety
In previous patches, the functions 'commit_shallow_file' and
'rollback_shallow_file' were introduced to reset the shallowness
validity checks on a repository after potentially modifying
'.git/shallow'.

These functions can be made safer by wrapping the 'struct lockfile *' in
a new type, 'shallow_lock', so that they cannot be called with a raw
lock (and potentially misused by other code that happens to possess a
lockfile, but has nothing to do with shallowness).

This patch introduces that type as a thin wrapper around 'struct
lockfile', and updates the two aforementioned functions and their
callers to use it.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30 14:19:13 -07:00
Taylor Blau 120ad2b0f1 shallow: extract a header file for shallow-related functions
There are many functions in commit.h that are more related to shallow
repositories than they are to any sort of generic commit machinery.
Likely this began when there were only a few shallow-related functions,
and commit.h seemed a reasonable enough place to put them.

But, now there are a good number of shallow-related functions, and
placing them all in 'commit.h' doesn't make sense.

This patch extracts a 'shallow.h', which takes all of the declarations
from 'commit.h' for functions which already exist in 'shallow.c'. We
will bring the remaining shallow-related functions defined in 'commit.c'
in a subsequent patch.

For now, move only the ones that already are implemented in 'shallow.c',
and update the necessary includes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30 14:19:13 -07:00
Denton Liu 7c16ef7577 switch: fix errors and comments related to -c and -C
In d787d311db (checkout: split part of it to new command 'switch',
2019-03-29), the `git switch` command was created by extracting the
common functionality of cmd_checkout() in checkout_main(). However, in
b7b5fce270 (switch: better names for -b and -B, 2019-03-29), the branch
creation and force creation options for 'switch' were changed to -c and
-C, respectively. As a result of this, error messages and comments that
previously referred to `-b` and `-B` became invalid for `git switch`.

For error messages that refer to `-b` and `-B`, use a format string
instead so that `-c` and `-C` can be printed when `git switch` is
invoked.

Reported-by: Robert Simpson
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30 13:43:31 -07:00
Junio C Hamano d2ea03ddee Merge branch 'ps/transactional-update-ref-stdin'
"git update-ref --stdin" learned a handful of new verbs to let the
user control ref update transactions more explicitly, which helps
as an ingredient to implement two-phase commit-style atomic
ref-updates across multiple repositories.

* ps/transactional-update-ref-stdin:
  update-ref: implement interactive transaction handling
  update-ref: read commands in a line-wise fashion
  update-ref: move transaction handling into `update_refs_stdin()`
  update-ref: pass end pointer instead of strbuf
  update-ref: drop unused argument for `parse_refname`
  update-ref: organize commands in an array
  strbuf: provide function to append whole lines
  git-update-ref.txt: add missing word
  refs: fix segfault when aborting empty transaction
2020-04-29 16:15:31 -07:00
Junio C Hamano 6eacc39b6d Merge branch 'en/fill-directory-exponential'
The directory traversal code had redundant recursive calls which
made its performance characteristics exponential with respect to
the depth of the tree, which was corrected.

* en/fill-directory-exponential:
  completion: fix 'git add' on paths under an untracked directory
  Fix error-prone fill_directory() API; make it only return matches
  dir: replace double pathspec matching with single in treat_directory()
  dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory()
  dir: replace exponential algorithm with a linear one
  dir: refactor treat_directory to clarify control flow
  dir: fix confusion based on variable tense
  dir: fix broken comment
  dir: consolidate treat_path() and treat_one_path()
  dir: fix simple typo in comment
  t3000: add more testcases testing a variety of ls-files issues
  t7063: more thorough status checking
2020-04-29 16:15:31 -07:00
Junio C Hamano 48eee46d6a Merge branch 'en/sparse-checkout'
"sparse-checkout" UI improvements.

* en/sparse-checkout:
  sparse-checkout: provide a new reapply subcommand
  unpack-trees: failure to set SKIP_WORKTREE bits always just a warning
  unpack-trees: provide warnings on sparse updates for unmerged paths too
  unpack-trees: make sparse path messages sound like warnings
  unpack-trees: split display_error_msgs() into two
  unpack-trees: rename ERROR_* fields meant for warnings to WARNING_*
  unpack-trees: move ERROR_WOULD_LOSE_SUBMODULE earlier
  sparse-checkout: use improved unpack_trees porcelain messages
  sparse-checkout: use new update_sparsity() function
  unpack-trees: add a new update_sparsity() function
  unpack-trees: pull sparse-checkout pattern reading into a new function
  unpack-trees: do not mark a dirty path with SKIP_WORKTREE
  unpack-trees: allow check_updates() to work on a different index
  t1091: make some tests a little more defensive against failures
  unpack-trees: simplify pattern_list freeing
  unpack-trees: simplify verify_absent_sparse()
  unpack-trees: remove unused error type
  unpack-trees: fix minor typo in comment
2020-04-29 16:15:30 -07:00
Junio C Hamano 3afdeef33e Merge branch 'dl/merge-autostash-rebase-quit-fix'
The stash entry created by "git rebase --autosquash" to keep the
initial dirty state were discarded by mistake upon "git rebase
--quit", which has been corrected.

* dl/merge-autostash-rebase-quit-fix:
  rebase: save autostash entry into stash reflog on --quit
2020-04-29 16:15:27 -07:00
Junio C Hamano bf10200871 Merge branch 'dl/merge-autostash'
"git merge" learns the "--autostash" option.

* dl/merge-autostash: (22 commits)
  pull: pass --autostash to merge
  t5520: make test_pull_autostash() accept expect_parent_num
  merge: teach --autostash option
  sequencer: implement apply_autostash_oid()
  sequencer: implement save_autostash()
  sequencer: unlink autostash in apply_autostash()
  sequencer: extract perform_autostash() from rebase
  rebase: generify create_autostash()
  rebase: extract create_autostash()
  reset: extract reset_head() from rebase
  rebase: generify reset_head()
  rebase: use apply_autostash() from sequencer.c
  sequencer: rename stash_sha1 to stash_oid
  sequencer: make apply_autostash() accept a path
  rebase: use read_oneliner()
  sequencer: make read_oneliner() extern
  sequencer: configurably warn on non-existent files
  sequencer: make read_oneliner() accept flags
  sequencer: make file exists check more efficient
  sequencer: stop leaking buf
  ...
2020-04-29 16:15:27 -07:00
Junio C Hamano dbd5e0a186 Revert "commit-graph.c: introduce '--[no-]check-oids'"
This reverts commit 7a9ce0269b,
which has not yet gained consensus.
2020-04-29 12:44:40 -07:00
Junio C Hamano 33a1060988 Merge branch 'mt/grep-cquote-path'
"git grep" did not quote a path with unusual character like other
commands (like "git diff", "git status") do, but did quote when run
from a subdirectory, both of which has been corrected.

* mt/grep-cquote-path:
  grep: follow conventions for printing paths w/ unusual chars
2020-04-28 15:50:09 -07:00
Junio C Hamano d3fc8dc53a Merge branch 'ds/log-exclude-decoration-config'
The "--decorate-refs" and "--decorate-refs-exclude" options "git
log" takes have learned a companion configuration variable
log.excludeDecoration that sits at the lowest priority in the
family.

* ds/log-exclude-decoration-config:
  log: add log.excludeDecoration config option
  log-tree: make ref_filter_match() a helper method
2020-04-28 15:50:08 -07:00
Junio C Hamano 5a96715eb7 Merge branch 'tb/diff-tree-with-notes'
"git diff-tree --pretty --notes" used to hit an assertion failure,
as it forgot to initialize the notes subsystem.

* tb/diff-tree-with-notes:
  diff-tree.c: load notes machinery when required
2020-04-28 15:50:07 -07:00
Junio C Hamano e81ecff10a Merge branch 'js/stash-p-fix'
Allowing the user to split a patch hunk while "git stash -p" does
not work well; a band-aid has been added to make this (partially)
work better.

* js/stash-p-fix:
  stash -p: (partially) fix bug concerning split hunks
  t3904: fix incorrect demonstration of a bug
2020-04-28 15:50:06 -07:00
Junio C Hamano 56a1d9ca6b Merge branch 'dl/libify-a-few'
Code in builtin/*, i.e. those can only be called from within
built-in subcommands, that implements bulk of a couple of
subcommands have been moved to libgit.a so that they could be used
by others.

* dl/libify-a-few:
  Lib-ify prune-packed
  Lib-ify fmt-merge-msg
2020-04-28 15:50:05 -07:00
Junio C Hamano 8f5dc5a4af Merge branch 'jt/avoid-prefetch-when-able-in-diff'
"git diff" in a partial clone learned to avoid lazy loading blob
objects in more casese when they are not needed.

* jt/avoid-prefetch-when-able-in-diff:
  diff: restrict when prefetching occurs
  diff: refactor object read
  diff: make diff_populate_filespec_options struct
  promisor-remote: accept 0 as oid_nr in function
2020-04-28 15:50:04 -07:00
Junio C Hamano 25b336421f Merge branch 'ds/commit-graph-expiry-fix'
"git commit-graph write --expire-time=<timestamp>" did not use the
given timestamp correctly, which has been corrected.

* ds/commit-graph-expiry-fix:
  commit-graph: fix buggy --expire-time option
2020-04-28 15:50:02 -07:00
Junio C Hamano 9404128b34 Merge branch 'jc/log-no-mailmap'
"git log" learns "--[no-]mailmap" as a synonym to "--[no-]use-mailmap"

* jc/log-no-mailmap:
  log: give --[no-]use-mailmap a more sensible synonym --[no-]mailmap
  clone: reorder --recursive/--recurse-submodules
  parse-options: teach "git cmd -h" to show alias as alias
2020-04-28 15:50:00 -07:00
Junio C Hamano 342bc9e29f Merge branch 'jk/config-use-size-t'
The config API made mixed uses of int and size_t types to represent
length of various pieces of text it parsed, which has been updated
to use the correct type (i.e. size_t) throughout.

* jk/config-use-size-t:
  config: reject parsing of files over INT_MAX
  config: use size_t to store parsed variable baselen
  git_config_parse_key(): return baselen as size_t
  config: drop useless length variable in write_pair()
  parse_config_key(): return subsection len as size_t
  remote: drop auto-strlen behavior of make_branch() and make_rewrite()
2020-04-28 15:49:58 -07:00
Junio C Hamano 2abd648b17 Merge branch 'bc/constant-memequal'
Validation of push certificate has been made more robust against
timing attacks.

* bc/constant-memequal:
  receive-pack: compilation fix
  builtin/receive-pack: use constant-time comparison for HMAC value
2020-04-28 15:49:57 -07:00
Johannes Schindelin d192fa5006 push: anonymize URLs in error messages and warnings
Just like 47abd85ba0 (fetch: Strip usernames from url's before storing
them, 2009-04-17) and later 882d49ca5c (push: anonymize URL in status
output, 2016-07-13), and even later c1284b21f2 (curl: anonymize URLs
in error messages and warnings, 2019-03-04) this change anonymizes URLs
(read: strips them of user names and especially passwords) in
user-facing error messages and warnings.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28 15:17:45 -07:00
Denton Liu 9b2df3e8d0 rebase: save autostash entry into stash reflog on --quit
In a03b55530a (merge: teach --autostash option, 2020-04-07), the
--autostash option was introduced for `git merge`. Notably, when
`git merge --quit` is run with an autostash entry present, it is saved
into the stash reflog. This is contrasted with the current behaviour of
`git rebase --quit` where the autostash entry is simply just dropped out
of existence.

Adopt the behaviour of `git merge --quit` in `git rebase --quit` and
save the autostash entry into the stash reflog instead of just deleting
it.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28 12:35:38 -07:00
Denton Liu ce9baf234f push: unset PARSE_OPT_OPTARG for --recurse-submodules
When the usage for `git push` is shown, it includes the following
lines

	--recurse-submodules[=(check|on-demand|no)]
			      control recursive pushing of submodules

which seem to indicate that the argument for --recurse-submodules is
optional. However, we cannot actually run that optiion without an
argument:

	$ git push --recurse-submodules
	fatal: recurse-submodules missing parameter

Unset PARSE_OPT_OPTARG so that it is clear that this option requires an
argument. Since the parse-options machinery guarantees that an argument
is present now, assume that `arg` is set in the else of
option_parse_recurse_submodules().

Reported-by: Andrew White <andrew.white@audinate.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28 10:47:42 -07:00
Denton Liu 203c85339f Use OPT_CALLBACK and OPT_CALLBACK_F
In the codebase, there are many options which use OPTION_CALLBACK in a
plain ol' struct definition. However, we have the OPT_CALLBACK and
OPT_CALLBACK_F macros which are meant to abstract these plain struct
definitions away. These macros are useful as they semantically signal to
developers that these are just normal callback option with nothing fancy
happening.

Replace plain struct definitions of OPTION_CALLBACK with OPT_CALLBACK or
OPT_CALLBACK_F where applicable. The heavy lifting was done using the
following (disgusting) shell script:

	#!/bin/sh

	do_replacement () {
		tr '\n' '\r' |
			sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\s*0,\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK(\1,\2,\3,\4,\5,\6)/g' |
			sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK_F(\1,\2,\3,\4,\5,\6,\7)/g' |
			tr '\r' '\n'
	}

	for f in $(git ls-files \*.c)
	do
		do_replacement <"$f" >"$f.tmp"
		mv "$f.tmp" "$f"
	done

The result was manually inspected and then reformatted to match the
style of the surrounding code. Finally, using
`git grep OPTION_CALLBACK \*.c`, leftover results which were not handled
by the script were manually transformed.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-28 10:47:10 -07:00
Elijah Newren a35413c378 rebase: display an error if --root and --fork-point are both provided
--root implies we want to rebase all commits since the beginning of
history.  --fork-point means we want to use the reflog of the specified
upstream to find the best common ancestor between <upstream> and
<branch> and only rebase commits since that common ancestor.  These
options are clearly contradictory, so throw an error (instead of
segfaulting on a NULL pointer) if both are specified.

Reported-by: Alexander Berg <alexander.berg@atos.net>
Documentation-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-27 11:51:26 -07:00