Commit graph

62322 commits

Author SHA1 Message Date
Junio C Hamano 845d6030f8 Merge branch 'jc/diffcore-rotate'
"git {diff,log} --{skip,rotate}-to=<path>" allows the user to
discard diff output for early paths or move them to the end of the
output.

* jc/diffcore-rotate:
  diff: --{rotate,skip}-to=<path>
2021-02-25 16:43:30 -08:00
Junio C Hamano 3da165ca28 Merge branch 'mt/checkout-index-corner-cases'
The error codepath around the "--temp/--prefix" feature of "git
checkout-index" has been improved.

* mt/checkout-index-corner-cases:
  checkout-index: omit entries with no tempname from --temp output
  write_entry(): fix misuses of `path` in error messages
2021-02-25 16:43:30 -08:00
Junio C Hamano f47c3328ef Merge branch 'js/doc-proto-v2-response-end'
Docfix.

* js/doc-proto-v2-response-end:
  doc: fix naming of response-end-pkt
2021-02-25 16:43:30 -08:00
Junio C Hamano 18decfd11d Merge branch 'rs/blame-optim'
Optimization in "git blame"

* rs/blame-optim:
  blame: remove unnecessary use of get_commit_info()
2021-02-25 16:43:29 -08:00
Junio C Hamano d590ae5560 Merge branch 'mz/doc-notes-are-not-anchors'
Objects that lost references can be pruned away, even when they
have notes attached to it (and these notes will become dangling,
which in turn can be pruned with "git notes prune").  This has been
clarified in the documentation.

* mz/doc-notes-are-not-anchors:
  docs: clarify that refs/notes/ do not keep the attached objects alive
2021-02-25 16:43:29 -08:00
Junio C Hamano 608cc4f273 Merge branch 'ab/detox-gettext-tests'
Removal of GIT_TEST_GETTEXT_POISON continues.

* ab/detox-gettext-tests:
  tests: remove most uses of test_i18ncmp
  tests: remove last uses of C_LOCALE_OUTPUT
  tests: remove most uses of C_LOCALE_OUTPUT
  tests: remove last uses of GIT_TEST_GETTEXT_POISON=false
2021-02-25 16:43:29 -08:00
Junio C Hamano 6fe12b5215 Merge branch 'jk/rev-list-disk-usage'
"git rev-list" command learned "--disk-usage" option.

* jk/rev-list-disk-usage:
  docs/rev-list: add some examples of --disk-usage
  docs/rev-list: add an examples section
  rev-list: add --disk-usage option for calculating disk usage
  t: add --no-tag option to test_commit
2021-02-25 16:43:29 -08:00
Junio C Hamano 7dd0eaa39c index-format doc: camelCase core.excludesFile
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 15:21:25 -08:00
Junio C Hamano edaf10dd26 blame-options.txt: camelcase blame.blankBoundary
All other references to blame.* configuration variables are
camelCased already.  Update this one to match.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 15:21:25 -08:00
Denton Liu 77645b5daa i18n.txt: camel case and monospace "i18n.commitEncoding"
In 95791be750 (doc: camelCase the i18n config variables to improve
readability, 2017-07-17), the other i18n config variables were
camel cased. However, this one instance was missed.

Camel case and monospace "i18n.commitEncoding" so that it matches the
surrounding text.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
[jc: fixed 3 other mistakes that are exactly the same]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 15:21:25 -08:00
Neeraj Singh f279894d28 read-cache: make the index write buffer size 128K
Writing an index 8K at a time invokes the OS filesystem and caching code
very frequently, introducing noticeable overhead while writing large
indexes. When experimenting with different write buffer sizes on Windows
writing the Windows OS repo index (260MB), most of the benefit came by
bumping the index write buffer size to 64K. I picked 128K to ensure that
we're past the knee of the curve.

With this change, the time under do_write_index for an index with 3M
files goes from ~1.02s to ~0.72s.

Signed-off-by: Neeraj Singh <neerajsi@ntdev.microsoft.com>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 13:40:30 -08:00
Matheus Tavares 9ebd7fe158 add: propagate --chmod errors to exit status
If `add` encounters an error while applying the --chmod changes, it
prints a message to stderr, but exits with a success code. This might
have been an oversight, as the command does exit with a non-zero code in
other situations where it cannot (or refuses to) update all of the
requested paths (e.g. when some of the given paths are ignored). So make
the exit behavior more consistent by also propagating --chmod errors to
the exit status.

Note: the test "all statuses changed in folder if . is given" uses paths
added by previous test cases, some of which might be symbolic links.
Because `git add --chmod` will now fail with such paths, this test would
depend on whether all the previous tests were executed, or only some
of them. Avoid that by running the test on a fresh repo with only
regular files.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 12:14:51 -08:00
Matheus Tavares 48960894f5 add: mark --chmod error string for translation
This error message is intended for humans, so mark it for translation.
Also use error() instead of fprintf(stderr, ...), to make the
corresponding line a bit cleaner, and to display the "error:" prefix,
which helps classifying the nature/severity of the message.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 12:14:51 -08:00
Matheus Tavares c937d70bfb add --chmod: don't update index when --dry-run is used
`git add --chmod` applies the mode changes even when `--dry-run` is
used. Fix that and add some tests for this option combination.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 12:14:51 -08:00
Jeff Hostetler 6347d649bc dir: fix malloc of root untracked_cache_dir
Use FLEX_ALLOC_STR() to allocate the `struct untracked_cache_dir`
for the root directory.  Get rid of unsafe code that might fail to
initialize the `name` field (if FLEX_ARRAY is not 1).  This will
make it clear that we intend to have a structure with an empty
string following it.

A problem was observed on Windows where the length of the memset() was
too short, so the first byte of the name field was not zeroed.  This
resulted in the name field having garbage from a previous use of that
area of memory.

The record for the root directory was then written to the untracked-cache
extension in the index.  This garbage would then be visible to future
commands when they reloaded the untracked-cache extension.

Since the directory record for the root directory had garbage in the
`name` field, the `t/helper/test-tool dump-untracked-cache` tool
printed this garbage as the path prefix (rather than '/') for each
directory in the untracked cache as it recursed.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 12:09:10 -08:00
Alex Henrie 2803d800d2 rebase: add a config option for --no-fork-point
Some users (myself included) would prefer to have this feature off by
default because it can silently drop commits.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 11:49:10 -08:00
Taylor Blau c4ff24bbb3 commit-graph.c: display correct number of chunks when writing
When writing a commit-graph, a progress meter is shown which indicates
the number of pieces of data to write (one per commit in each chunk).

In 47410aa837 (commit-graph: use chunk-format write API, 2021-02-18),
the number of chunks became tracked by the new chunk-format API. But a
stray local variable was left behind from when write_commit_graph_file()
used to keep track of the same.

Since this was no longer updated after 47410aa837, the progress meter
appeared broken:

    $ git commit-graph write --reachable
    Expanding reachable commits in commit graph: 837569, done.
    Writing out commit graph in 3 passes: 166% (4187845/2512707), done.

Drop the local variable and rely instead on the chunk-format API to tell
us the correct number of chunks.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-24 11:44:34 -08:00
Junio C Hamano 20e416409f push: do not turn --delete '' into a matching push
When we added a syntax sugar "git push remote --delete <ref>" to
"git push" as a synonym to the canonical "git push remote :<ref>"
syntax at f517f1f2 (builtin-push: add --delete as syntactic sugar
for :foo, 2009-12-30), we weren't careful enough to make sure that
<ref> is not empty.

Blindly rewriting "--delete <ref>" to ":<ref>" means that an empty
string <ref> results in refspec ":", which is the syntax to ask for
"matching" push that does not delete anything.

Worse yet, if there were matching refs that can be fast-forwarded,
they would have been published prematurely, even if the user feels
that they are not ready yet to be pushed out, which would be a real
disaster.

Noticed-by: Tilman Vogel <tilman.vogel@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23 15:19:34 -08:00
Jeff King 01168a9d89 doc: mention approxidates for git-commit --date
We describe the more strict date formats accepted by GIT_COMMITTER_DATE,
etc, but the --date option also allows the looser approxidate formats,
as well. Unfortunately we don't have a good or complete reference for
this format, but let's at least mention that it _is_ looser, and give a
few examples.

If we ever write separate, more complete date-format documentation, we
should refer to it from here.

Based-on-a-patch-by: Utku Gultopu <ugultopu@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23 13:33:02 -08:00
Johannes Sixt b865734760 replace "parameters" by "arguments" in error messages
When an error message informs the user about an incorrect command
invocation, it should refer to "arguments", not "parameters".

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23 13:30:45 -08:00
Seth House 30bb8088af mergetools/vimdiff: add vimdiff1 merge tool variant
This adds yet another vimdiff/gvimdiff variant and presents conflicts as
a two-way diff between 'LOCAL' and 'REMOTE'. 'MERGED' is not opened
which deviates from the norm so usage text is echoed as a Vim message on
startup that instructs the user with how to proceed and how to abort.

Vimdiff is well-suited to two-way diffs so this is an option for a more
simple, more streamlined conflict resolution. For example: it is
difficult to communicate differences across more than two files using
only syntax highlighting; default vimdiff commands to get and put
changes between buffers do not need the user to manually specify
a source or destination buffer when only using two buffers.

Like other merge tools that directly compare 'LOCAL' with 'REMOTE', this
tool will benefit when paired with the new `mergetool.hideResolved`
setting.

Signed-off-by: Seth House <seth@eseth.com>
Tested-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23 11:37:13 -08:00
Han-Wen Nienhuys 00f68732e5 doc/reftable: document how to handle windows
On Windows we can't delete or overwrite files opened by other processes. Here we
sketch how to handle this situation.

We propose to use a random element in the filename. It's possible to design an
alternate solution based on counters, but that would assign semantics to the
filenames that complicates implementation.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23 10:01:21 -08:00
Eric Sunshine 26c7974376 maintenance: fix incorrect maintenance.repo path with bare repository
The periodic maintenance tasks configured by `git maintenance start`
invoke `git for-each-repo` to run `git maintenance run` on each path
specified by the multi-value global configuration variable
`maintenance.repo`. Because `git for-each-repo` will likely be run
outside of the repositories which require periodic maintenance, it is
mandatory that the repository paths specified by `maintenance.repo` are
absolute.

Unfortunately, however, `git maintenance register` does nothing to
ensure that the paths it assigns to `maintenance.repo` are indeed
absolute, and may in fact -- especially in the case of a bare repository
-- assign a relative path to `maintenance.repo` instead. Fix this
problem by converting all paths to absolute before assigning them to
`maintenance.repo`.

While at it, also fix `git maintenance unregister` to convert paths to
absolute, as well, in order to ensure that it can correctly remove from
`maintenance.repo` a path assigned via `git maintenance register`.

Reported-by: Clement Moyroud <clement.moyroud@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-23 00:22:45 -08:00
Junio C Hamano 966e671106 The tenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 16:12:43 -08:00
Junio C Hamano d68fccef86 Merge branch 'ab/test-lib'
Test framework clean-up.

* ab/test-lib:
  test-lib-functions: assert correct parameter count
  test-lib-functions: remove bug-inducing "diagnostics" helper param
  test libs: rename "diff-lib" to "lib-diff"
  t/.gitattributes: sort lines
  test-lib-functions: move function to lib-bitmap.sh
  test libs: rename gitweb-lib.sh to lib-gitweb.sh
  test libs: rename bundle helper to "lib-bundle.sh"
  test-lib-functions: remove generate_zero_bytes() wrapper
  test-lib-functions: move test_set_index_version() to its user
  test lib: change "error" to "BUG" as appropriate
  test-lib: remove check_var_migration
2021-02-22 16:12:43 -08:00
Junio C Hamano 45df6c4d75 Merge branch 'ab/diff-deferred-free'
A small memleak in "diff -I<regexp>" has been corrected.

* ab/diff-deferred-free:
  diff: plug memory leak from regcomp() on {log,diff} -I
  diff: add an API for deferred freeing
2021-02-22 16:12:43 -08:00
Junio C Hamano dcb11fc622 Merge branch 'ab/pager-exit-log'
When a pager spawned by us exited, the trace log did not record its
exit status correctly, which has been corrected.

* ab/pager-exit-log:
  pager: properly log pager exit code when signalled
  run-command: add braces for "if" block in wait_or_whine()
  pager: test for exit code with and without SIGPIPE
  pager: refactor wait_for_pager() function
2021-02-22 16:12:43 -08:00
Junio C Hamano dc24948be9 Merge branch 'ta/hash-function-transition-doc'
Update formatting and grammar of the hash transition plan
documentation, plus some updates.

* ta/hash-function-transition-doc:
  doc: use https links
  doc hash-function-transition: move rationale upwards
  doc hash-function-transition: fix incomplete sentence
  doc hash-function-transition: use upper case consistently
  doc hash-function-transition: use SHA-1 and SHA-256 consistently
  doc hash-function-transition: fix asciidoc output
2021-02-22 16:12:43 -08:00
Junio C Hamano 15af6e6fee Merge branch 'bc/signed-objects-with-both-hashes'
Signed commits and tags now allow verification of objects, whose
two object names (one in SHA-1, the other in SHA-256) are both
signed.

* bc/signed-objects-with-both-hashes:
  gpg-interface: remove other signature headers before verifying
  ref-filter: hoist signature parsing
  commit: allow parsing arbitrary buffers with headers
  gpg-interface: improve interface for parsing tags
  commit: ignore additional signatures when parsing signed commits
  ref-filter: switch some uses of unsigned long to size_t
2021-02-22 16:12:42 -08:00
Junio C Hamano b9554c03a0 Merge branch 'dl/stash-cleanup'
Documentation, code and test clean-up around "git stash".

* dl/stash-cleanup:
  stash: declare ref_stash as an array
  t3905: use test_cmp() to check file contents
  t3905: replace test -s with test_file_not_empty
  t3905: remove nested git in command substitution
  t3905: move all commands into test cases
  t3905: remove spaces after redirect operators
  git-stash.txt: be explicit about subcommand options
2021-02-22 16:12:42 -08:00
Andrzej Hunt bf4bb9f9f5 commit-graph: avoid leaking topo_levels slab in write_commit_graph()
write_commit_graph initialises topo_levels using init_topo_level_slab(),
next it calls compute_topological_levels() which can cause the slab to
grow, we therefore need to clear the slab again using
clear_topo_level_slab() when we're done.

First introduced in 72a2bfca (commit-graph: add a slab to store
topological levels, 2021-01-16).

LeakSanitizer output:

==1026==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x498ae9 in realloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xafbed8 in xrealloc /src/git/wrapper.c:126:8
    #2 0x7966d1 in topo_level_slab_at_peek /src/git/commit-graph.c:71:1
    #3 0x7965e0 in topo_level_slab_at /src/git/commit-graph.c:71:1
    #4 0x78fbf5 in compute_topological_levels /src/git/commit-graph.c:1472:12
    #5 0x78c5c3 in write_commit_graph /src/git/commit-graph.c:2456:2
    #6 0x535c5f in graph_write /src/git/builtin/commit-graph.c:299:6
    #7 0x5350ca in cmd_commit_graph /src/git/builtin/commit-graph.c:337:11
    #8 0x4cddb1 in run_builtin /src/git/git.c:453:11
    #9 0x4cabe2 in handle_builtin /src/git/git.c:704:3
    #10 0x4cd084 in run_argv /src/git/git.c:771:4
    #11 0x4ca424 in cmd_main /src/git/git.c:902:19
    #12 0x707fb6 in main /src/git/common-main.c:52:11
    #13 0x7fee4249383f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)

Indirect leak of 524256 byte(s) in 1 object(s) allocated from:
    #0 0x498942 in calloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0xafc088 in xcalloc /src/git/wrapper.c:140:8
    #2 0x796870 in topo_level_slab_at_peek /src/git/commit-graph.c:71:1
    #3 0x7965e0 in topo_level_slab_at /src/git/commit-graph.c:71:1
    #4 0x78fbf5 in compute_topological_levels /src/git/commit-graph.c:1472:12
    #5 0x78c5c3 in write_commit_graph /src/git/commit-graph.c:2456:2
    #6 0x535c5f in graph_write /src/git/builtin/commit-graph.c:299:6
    #7 0x5350ca in cmd_commit_graph /src/git/builtin/commit-graph.c:337:11
    #8 0x4cddb1 in run_builtin /src/git/git.c:453:11
    #9 0x4cabe2 in handle_builtin /src/git/git.c:704:3
    #10 0x4cd084 in run_argv /src/git/git.c:771:4
    #11 0x4ca424 in cmd_main /src/git/git.c:902:19
    #12 0x707fb6 in main /src/git/common-main.c:52:11
    #13 0x7fee4249383f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f)

SUMMARY: AddressSanitizer: 524264 byte(s) leaked in 2 allocation(s).

Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:45:01 -08:00
ZheNing Hu 1c881026a1 difftool.c: learn a new way start at specified file
`git difftool` only allow us to select file to view in turn.
If there is a commit with many files and we exit in the middle,
we will have to traverse list again to get the file diff which
we want to see. Therefore,teach the command an option
`--skip-to=<path>` to allow the user to say that diffs for earlier
paths are not interesting (because they were already seen in an
earlier session) and start this session with the named path.

Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:35:49 -08:00
Derrick Stolee 41f3c9949f commit-reach: stale commits may prune generation further
The remove_redundant_with_gen() algorithm performs a depth-first-search
to find commits in the 'array' list, starting at the parents of each
commit in 'array'. The result is that commits in 'array' are marked
STALE when they are reachable from another commit in 'array'.

This depth-first-search is fast when commits lie on or near the
first-parent history of the higher commits. The search terminates early
if all but one commit becomes marked STALE.

However, it is possible that there are two independent commits with high
generation number. In that case, the depth-first-search might languish
by searching in lower generations due to the fixed min_generation used
throughout the method.

With the expectation that commits with lower generation are expected to
become STALE more often, we can optimize further by increasing that
min_generation boundary upon discovery of the commit with minimum
generation.

We must first sort the commits in 'array' by generation. We cannot sort
'array' itself since it must preserve relative order among the returned
results (see revision.c:mark_redundant_parents() for an example).

This simplifies the initialization of min_generation, but it also allows
us to increase the new min_generation when we find the commit with
smallest generation remaining.

This requires more than two commits in order to test, so I used the
Linux kernel repository with a few commits that are slightly off of the
first-parent history. I timed the following command:

  git merge-base --independent 2ecedd756908 d2360a398f0b \
	1253935ad801 160bab43419e 0e2209629fec 1d0e16ac1a9e

The first two commits have similar generation and are near the v5.10
tag. Commit 160bab43419e is off of the first-parent history behind v5.5,
while the others are scattered somewhere reachable from v5.9. This is
designed to demonstrate the optimization, as that commit within v5.5
would normally cause a lot of extra commit walking.

Since remove_redundant_with_alg() is called only when at least one of
the input commits has a finite generation number, this algorithm is
tested with a commit-graph generated starting at a number of different
tags, the earliest being v5.5.

commit-graph at v5.5:

 | Method                | Time  |
 |-----------------------+-------|
 | *_no_gen()            | 864ms |
 | *_with_gen() (before) | 858ms |
 | *_with_gen() (after)  | 810ms |

commit-graph at v5.7:

 | Method                | Time  |
 |-----------------------+-------|
 | *_no_gen()            | 625ms |
 | *_with_gen() (before) | 572ms |
 | *_with_gen() (after)  | 517ms |

commit-graph at v5.9:

 | Method                | Time  |
 |-----------------------+-------|
 | *_no_gen()            | 268ms |
 | *_with_gen() (before) | 224ms |
 | *_with_gen() (after)  | 202ms |

commit-graph at v5.10:

 | Method                | Time  |
 |-----------------------+-------|
 | *_no_gen()            |  72ms |
 | *_with_gen() (before) |  37ms |
 | *_with_gen() (after)  |   9ms |

Note that these are only modest improvements for the case where the two
independent commits are not in the commit-graph (not until v5.10). All
algorithms get faster as more commits are indexed, which is not a
surprise. However, the cost of walking extra commits is more and more
prevalent in relative terms as more commits are indexed. Finally, the
last case allows us to jump to the minimum generation between the last
two commits (that are actually independent) so we greatly reduce the
cost in that case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:34:34 -08:00
Derrick Stolee 3677773371 commit-reach: use heuristic in remove_redundant()
Reachability algorithms in commit-reach.c frequently benefit from using
the first-parent history as a heuristic for satisfying reachability
queries. The most obvious example was implemented in 4fbcca4e
(commit-reach: make can_all_from_reach... linear, 2018-07-20).

Update the walk in remove_redundant() to use this same heuristic. Here,
we are walking starting at the parents of the input commits. Sort those
parents and walk from the highest generation to lower. Each time, use
the heuristic of searching the first parent history before continuing to
expand the walk.

The order in which we explore the commits matters, so update
compare_commits_by_gen to break generation number ties with commit date.
This has no effect when the commits are in a commit-graph file with
corrected commit dates computed, but it will assist when the commits are
in the region "above" the commit-graph with "infinite" generation
number. Note that we cannot shift to use
compare_commits_by_gen_then_commit_date as the method prototype is
different. We use compare_commits_by_gen for QSORT() as opposed to as a
priority function.

The important piece is to ensure we short-circuit the walk when we find
that there is a single non-redundant commit. This happens frequently
when looking for merge-bases or comparing several tags with 'git
merge-base --independent'. Use a new count 'count_still_independent' and
if that hits 1 we can stop walking.

To update 'count_still_independent' properly, we add use of the RESULT
flag on the input commits. Then we can detect when we reach one of these
commits and decrease the count. We need to remove the RESULT flag at
that moment because we might re-visit that commit when popping the
stack.

We use the STALE flag to mark parents that have been added to the new
walk_start list, but we need to clear that flag before we start walking
so those flags don't halt our depth-first-search walk.

On my copy of the Linux kernel repository, the performance of 'git
merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11
seconds.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:34:34 -08:00
Derrick Stolee c8d693e1e6 commit-reach: move compare_commits_by_gen
Move this earlier in the file so it can be used by more methods.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:34:34 -08:00
Derrick Stolee fbc21e3fbb commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.

For example, in the Linux kernel repository, I tested the performance
by passing all tags:

 git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")

(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)

Here is the performance improvement introduced by this change:

 Before: 16.4s
  After:  1.1s

This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().

The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.

At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.

This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:34:34 -08:00
Christian Walther 3a837b58e3 doc: mention bigFileThreshold for packing
Knowing about the core.bigFileThreshold configuration variable is
helpful when examining pack file size differences between repositories.
Add a reference to it to the manpages a user is likely to read in this
situation.

Capitalize CONFIGURATION for consistency with other pages having such a
section.

Signed-off-by: Christian Walther <cwalther@gmx.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:18:30 -08:00
Jonathan Tan 5476e1efde fetch-pack: print and use dangling .gitmodules
Teach index-pack to print dangling .gitmodules links after its "keep" or
"pack" line instead of declaring an error, and teach fetch-pack to check
such lines printed.

This allows the tree side of the .gitmodules link to be in one packfile
and the blob side to be in another without failing the fsck check,
because it is now fetch-pack which checks such objects after all
packfiles have been downloaded and indexed (and not index-pack on an
individual packfile, as it is before this commit).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 12:07:40 -08:00
Jonathan Tan b664e9ffa1 fetch-pack: with packfile URIs, use index-pack arg
Unify the index-pack arguments used when processing the inline pack and
when downloading packfiles referenced by URIs. This is done by teaching
get_pack() to also store the index-pack arguments whenever at least one
packfile URI is given, and then when processing the packfile URI(s),
using the stored arguments.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 12:07:40 -08:00
Jonathan Tan 27e35ba6c6 http-fetch: allow custom index-pack args
This is the next step in teaching fetch-pack to pass its index-pack
arguments when processing packfiles referenced by URIs.

The "--keep" in fetch-pack.c will be replaced with a full message in a
subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 12:07:40 -08:00
Jonathan Tan 726b25a91b http: allow custom index-pack args
Currently, when fetching, packfiles referenced by URIs are run through
index-pack without any arguments other than --stdin and --keep, no
matter what arguments are used for the packfile that is inline in the
fetch response. As a preparation for ensuring that all packs (whether
inline or not) use the same index-pack arguments, teach the http
subsystem to allow custom index-pack arguments.

http-fetch has been updated to use the new API. For now, it passes
--keep alone instead of --keep with a process ID, but this is only
temporary because http-fetch itself will be taught to accept index-pack
parameters (instead of using a hardcoded constant) in a subsequent
commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 12:07:40 -08:00
Pratyush Yadav b1056f60b6 Merge branch 'py/commit-comments'
Use git-stripspace to remove comment lines from the commit message. Also
use it to clean up whitespace instead of rolling our own logic.

* py/commit-comments:
  git-gui: remove lines starting with the comment character
2021-02-22 20:19:53 +05:30
Junio C Hamano 1b5b8cf072 Documentation: typofix --column description
f4ed0af6 (Merge branch 'nd/columns', 2012-05-03) brought in three
cut-and-pasted copies of malformatted descriptions.  Let's fix them
all the same way by marking the configuration variable names up as
monospace just like the command line option `--column` is typeset.

While we are at it, correct a missing space after the full stop that
ends the sentence.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 19:36:47 -08:00
Derrick Stolee a43a2e6c2a chunk-format: add technical docs
The chunk-based file format is now an API in the code, but we should
also take time to document it as a file format. Specifically, it matches
the CHUNK LOOKUP sections of the commit-graph and multi-pack-index
files, but there are some commonalities that should be grouped in this
document.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 13:38:16 -08:00
Derrick Stolee 5387fefadc chunk-format: restore duplicate chunk checks
Before refactoring into the chunk-format API, the commit-graph parsing
logic included checks for duplicate chunks. It is unlikely that we would
desire a chunk-based file format that allows duplicate chunk IDs in the
table of contents, so add duplicate checks into
read_table_of_contents().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 13:38:16 -08:00
Derrick Stolee 329fac3a36 midx: use 64-bit multiplication for chunk sizes
When calculating the sizes of certain chunks, we should use 64-bit
multiplication always. This allows us to properly predict the chunk
sizes without risk of overflow.

Other possible overflows were discovered by evaluating each
multiplication in midx.c and ensuring that at least one side of the
operator was of type size_t or off_t.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 13:38:16 -08:00
Derrick Stolee 6ab3b8b8b8 midx: use chunk-format read API
Instead of parsing the table of contents directly, use the chunk-format
API methods read_table_of_contents() and pair_chunk(). In particular, we
can use the return value of pair_chunk() to generate an error when a
required chunk is missing.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 13:38:16 -08:00
Derrick Stolee 2692c2f6fd commit-graph: use chunk-format read API
Instead of parsing the table of contents directly, use the chunk-format
API methods read_table_of_contents() and pair_chunk(). While the current
implementation loses the duplicate-chunk detection, that will be added
in a future change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 13:38:16 -08:00
Derrick Stolee 5f0879f54b chunk-format: create read chunk API
Add the capability to read the table of contents, then pair the chunks
with necessary logic using read_chunk_fn pointers. Callers will be added
in future changes, but the typical outline will be:

 1. initialize a 'struct chunkfile' with init_chunkfile(NULL).
 2. call read_table_of_contents().
 3. for each chunk to parse,
    a. call pair_chunk() to assign a pointer with the chunk position, or
    b. call read_chunk() to run a callback on the chunk start and size.
 4. call free_chunkfile() to clear the 'struct chunkfile' data.

We are re-using the anonymous 'struct chunkfile' data, as it is internal
to the chunk-format API. This gives it essentially two modes: write and
read. If the same struct instance was used for both reads and writes,
then there would be failures.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 13:38:16 -08:00
Derrick Stolee 63a8f0e9b9 midx: use chunk-format API in write_midx_internal()
The chunk-format API allows writing the table of contents and all chunks
using the anonymous 'struct chunkfile' type. We only need to convert our
local chunk logic to this API for the multi-pack-index writes to share
that logic with the commit-graph file writes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 13:38:16 -08:00